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Preface 


This book attempts to give students a unified introduction to the models, 
methods, and theory of modern linear algebra. Linear models are now used 
at least as widely as calculus-based models. The world today is commonly 
thought to consist of large, complex systems with many input and output 
variables. Linear models are the primary tool for analyzing these systems. 
A course based on this book (or one like it) should prove to be the most 
useful college mathematics course most students ever take. With this goal 
in mind, the material is presented with an eye toward making it easy to 
remember, not just for the next hour test but for a lifetime of diverse uses. . 

Linear algebra is an ideal subject for a lower-level college course in 
mathematics, because the theory, numerical techniques, and applications are 
interwoven so beautifully. The theory of linear algebra is powerful, yet easily 
accessible. Best of all, theory in linear algebra is likable. It simplifies and 
clarifies the workings of linear models and related computations. This is 
what mathematics is really about, making things simple and clear. It provides 
important answers that go beyond results we could obtain by brute compu- 
tation. For too many students, mathematics is either a collection of tech- 
niques, as in calculus, or a collection of formal theory with limited appli- 
cations, as in most courses after calculus (including traditional linear algebra 
courses). This book tries to rectify this artificial dichotomy. 

Again, the applications of linear algebra are powerful, easily under- 
stood, and very diverse. This book introduces students to economic in- 
put—output models, population growth models, Markov chains, linear pro- 
gramming, computer graphics, regression and other statistical techniques, 
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numerical methods for approximate solutions to most calculus problems, 
linear codes, and much more. These different applications reinforce each 
other and associated theory. Indeed, without these motivating applications, 
several of the more theoretical topics could not be covered in an introductory 
textbook. 

The field of linear numerical analysis is very young, having been de- 
pendent on digital computers for its development. This field has wrought 
major changes in what linear algebra theory should be taught in an intro- 
ductory course. The standout example of such a modern linear algebra text 
is G. Strang’s Linear Algebra and Its Applications. Once the theory was 
needed as an alternative to numerical computation, which was hopelessly 
difficult. Now theory helps direct and interpret the numerical computation, 
which computers do for us. 

Overview of the Text This book develops linear algebra around mat- 
rices. Vector spaces in the abstract are not considered, only vector spaces 
associated with matrices. This book puts problem solving and an intuitive 
treatment of theory first, with a proof-oriented approach intended to come 
in a second course, the same way that calculus is taught. 

The book’s organization is straightforward: Chapter 1 has introductory 
linear models; Chapter 2 has the basics of matrix algebra; Chapter 3 develops 
different ways to solve a system of equations; Chapter 4 has applications, 
and Chapter 5 has vector-space theory associated with matrices and related 
topics such as pseudoinverses and orthogonalization. Many linear algebra 
textbooks start immediately with Gaussian elimination, before any matrix 
algebra. Here we first pose problems in Chapter |, then develop a mathe- 
matical language for representing and recasting the problems in Chapter 2, 
and then look at ways to solve the problems in Chapter 3—four different 
solution methods are presented with an analysis of strengths and weaknesses 
of each. 

In ‘most applications of linear algebra, the most difficult aspect is un- 
derstanding matrix expressions, such as Ue?U~ '. Students from a traditional 
linear algebra course have little preparation for understanding such expres- 
sions. This book constantly forces students to interpret the meaning of matrix 
expressions, not just perform rote computations. Matrix notation is used as 
much as possible, rather than constantly writing out systems of equations. 
The sections are generally too long to be covered completely in class; most 
have several examples (based on familiar models) that are designed to be 
read by students on their own without explanation by the instructor. The 
goal is for students to be able to read and understand uses of matrix algebra 
for themselves. 

The material is unified pedagogically by the repeated use of a few 
linear models to illustrate all new concepts and techniques. These models 
give the student mental pictures to ‘‘visualize’’ new ideas during this course 
and help remember the ideas after the course is over. 

Although this book is often informal (“*‘proving theorems’’ by example) 
and sticks mainly to matrices rather than general linear transformations, it 
covers several topics normally left to a more advanced course, such as matrix 
norms, matrix decompositions, and approximation by orthogonal polyno- 
mials. These advanced topics find immediate, concrete applications. In ad- 
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dition, they are finite-dimensional versions of important theory in functional 
analysis; for example, the eigenvalue decomposition of a matrix into simple 
matrices is a special case of the spectral representation of linear operators. 

Discrete Versus Continuous Mathematics Today there is a major cur- 
riculum debate in the mathematics community between computer sci- 
ence—oriented discrete mathematics and classical calculus-based mathemat- 
ics. Linear algebra, especially as viewed in this book, is right in the middle 
of this debate. (Linear algebra and matrices have always been in the middle 
of such debates. Matrices were a core topic in the best-known first-year 
college mathematics text before 1950, Hall and Knight’s College Algebra; 
and much of Kemeny, Snell, and Thompson's /ntroduction to Finite Math- 
ematics involved new applications of linear algebra: Markov chains and 
linear programming. ) 

This book attempts to present a healthy interplay between mathematics 
and computer science, that is, between continuous and discrete modes of 
thinking. The complementary roles of continuous and discrete thinking are 
typified by the different uses of the euclidean norm (1,-norm) and sum norm 
(1,-norm) in this book. An important example of computer science thinking 
in this book is matrix representations, such as the LU decomposition. They 
are viewed as a way to preprocess the data in a matrix in order to be ready 
to solve quickly certain types of matrix problems. 

We note that computer science even gives insights into the teaching 
of any linear algebra course. A computer scientist’s distinction between high- 
level languages (such as PASCAL) and low-level languages (such as assem- 
bly language) applies to linear algebra proofs: A high-level proof involves 
matrix notation, such as B7A’ = (AB)’, while a low-level proof involves 
individual entries a,,, such as c,, = La,b,,. 

Suggested Course Syllabus This book contains more than can be cov- 
ered in the typical first-semester sophomore course for which it is intended. 
Most of Chapters 1, 2, and 3 and the first four sections of Chapter 5 should 
normally be covered. A freshman course would skip Chapter 5. In addition, 
selected sections of Chapter 4 can be chosen based on available time and 
the class’s interests. For the student, the essence of any course should be 
the homework. This book has a large number of exercises at all levels of 
difficulty: computational exercises, applications, and proofs of much of the 
basic theory (with extensive hints for harder proofs). For more information 
about course outlines, plus suggested homework sets, sample exams, and 
additional solutions of exercises, see the accompanying Instructor’s Manual. 

At the end of the book is a list of various programming languages and 
software packages available for performing matrix operations. It is recom- 
mended that students have access to computers with ready matrix software 
in the first week, 
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relatives. My father, A. W. Tucker, ignited and nurtured my born-again 
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portive atmosphere that eased the long hours of writing; more concretely, 
Lisa’s calculus project on cubic splines became the appendix to Section 4.7. 
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"Section bl Mathematical Models 


This book is concerned with ways to organize and analyze complex systems. 
From the time of the ancient Greeks until the middle of the twentieth century, 
scientists concentrated on problems involving a small number of variables. 
The calculus of Newton and Leibnitz dealt with functions of a single vari- 
able, later generalized to several variables. Although the functions studied 
in calculus can display very complex behavior, the amount of input data is 
usually quite small. 

Today, scientists face problems involving large amounts of data. Con- 
sider the following examples of modern complex systems. 


1. A mathematical model of the U.S. economy that considers the interactive 
effects of supplies and demands of various goods. The model may involve 
thousands of variables and equations. 

2. The task of routing long-distance telephone calls. Every second, thou- 
sands of calls must be instantly routed from various origins to destinations 
through many intermediate switching stations. The system doing the rout- 
ing procedure must look for circuitous indirect routes when more direct 
pathways are saturated. 

3. Statistical studies of factors implicated in the spread of some new disease. 
Hundreds of causative agents must be analyzed for interactive effects; 
whereas neither effect A alone nor effect B alone may make a person 
susceptible to the illness, effects A and B together make a person very 
susceptible. 
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4. A mathematical model simulating the airflow around a jet aircraft whose 
design requires thousands of parameters to describe. 


Although this book does not contain 100-, much less 1000-variable 
problems (5 variables are usually sufficient for realistic examples), it is 
concerned with mathematical techniques that can readily be applied to such 
large problems. 

In this opening chapter we present some basic models that are used 
throughout the book to illustrate concepts of matrix theory and associated 
computations. In Chapter 2 we develop the basic tools of matrix algebra. In 
Chapter 3 we present various ways to solve a system of linear equations. In 
Chapter 4 we use matrix algebra to analyze a collection of models in greater 
depth. In Chapter 5 we discuss the theory of solutions to systems of equa- 
tions. 

Intuitively, the more information we put into a model, the more ac- 
curate should be the analysis obtained. The problem is, how do we handle 
all this information? How do we construct sensible models using hundreds 
of inputs when we do not really know the underlying mechanism by which 
the input variables affect the model? In the ‘‘old’’ days when scientists 
studied simpler systems, very accurate mathematical models were obtained, 
say, to describe a spherical body’s rate of fall based on three critical param- 
eters: the time elapsed since the body was dropped, the body’s density, and 
the density of air. When hundreds of interdependent variables are involved, 
there is little chance of obtaining a precise mathematical model. 

If we do not understand well the system we are modeling, then the 
structure of our mathematical model should be simple. But the model must 
still be useful—tell us things about the system that we could not otherwise 
easily find out. We shall see as we work through this book, that a linear 
mathematical model is often the best choice. Before defining a linear model, 
let us state what we mean by a mathematical model in general. 

A mathematical model is a mathematical formulation of some class 
of real-world problems. The formulation may be a set of equations, it may 
be the minimization of some function, or it may involve integrals. The model 
may embody various constraints on its variables or it may be a combination 
of other mathematical models. Part of the modeling process involves input 
values that vary from one instance of the problem to another. These values 
are coefficients and constants for equations in the model. 

Let us consider five simple mathematical models. The first is a physics 
model derived with calculus. The other four are standard high school algebra 
problems. 


Example 1. Falling Objects 


In physics, the height H of an object dropped off a building is modeled 
by the formula 


where 7 is time (in seconds) elapsed since the object was dropped, 


Sec. 1.1 Mathematical Models 3 


Figure 1.1 Equation of falling object 
H = —16T* + 100. 


and H, is the height (in feet) of the building. (The numbers 16 and H, 
are the input values of the model.) One can derive this formula with 
calculus (if drag from air resistance is ignored). 

Suppose that the building is 100 feet tall. Figure 1.1 gives the 
graph of the equation H = —167* + 100. When 2 seconds has 
elapsed, the object’s current height can be computed to be 


H = —16(2)? + 100 = —64 + 100 = 36 


To determine the time until the object hits the ground, we set 
H = O and solve for T: 


0 = —167? + 100 — 167? = 100 
+7? =f = # 
—>T = V2 =} my 


FEET EES 
Example 2. Elementary Algebra: 
A Problem of Relative Ages 


Consider the following word problem. Michael is three times the age 
of his sister, but in 6 years he will be only twice his sister’s age. How 
old is Michael now? 

We want to model the information in the word problem with 
algebraic equations. If M is Michael’s current age and S is his sister’s 
age, we have 


M = 3S (2) 
M+ 6 = AS + 6) (3) 


This model is simply an algebraic restatement of information 
given in words. We were told that certain quantities were equal; for 
example, Michael’s current age equals three times his sister’s current 
age, and we expressed these equalities in symbolic (algebraic) form. 
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There are two variables in this problem, but the relation between 
the two variables M and S is so simple—one is triple the other—that 
one can easily rewrite the second equation (3) in terms of one of the 
variables. If we write (3) in terms of S by substituting 3S for M, we 
obtain 


(3S) + 6 = 2(S + 6) (3') 


Solving equation (3’), we obtain S = 6, so M = 3S = 18. € 


We now consider a slightly more complicated age problem. 


— 


Example 3. Another Problem of Relative Ages 


Alice is currently twice as old as her brother Bill. If twice the sum of 


their current ages is equal to the product of their ages 4 years ago, 
how old is Bill? 


If A represents Alice’s current age and B Bill’s current age, the 
given information can be modeled by the system of two equations 


Papatis (4) 
2(A + B) = (A — 4)(B —- 4) 


Expanding and simplifying the second equation, we obtain 


A = 2B 
AB — 6A — 6B + 16 = 0 ©) 


Note that this model involves a term of ‘‘degree 2,’’ the product AB 
in the second equation. 


Substituting 2B for A in the second equation yields 
(2B)B — 6(2B) - 6B + 16=0 or B?-—9B+8=0 (6) 
Factoring the right equation in (6) yields 
(B — 1B — 8) =0 (7) 


and thus B = lor B = 8. 

The solution B = | is not possible in terms of the problem 
statement (if Bill is currently 1 year old, what was he 4 years ago?). 
Thus the answer is: Bill is 8 (and Alice is 16). iw 


Although both (B = 1,A = 2) and(B = 8,A = 16) satisfy algebraic 
system (4), only the second pair of values makes sense in the original real- 
world problem. This illustrates an important aspect of modeling that is fre- 
quently assumed in this book: namely, interpreting a mathematical solution 
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Figure 1.2 Modeling process. Real Symbolic representation Mathematical 
ee . 
world Solution nny de | 


Verification Mathematical theories 
and techniques 


Conclusions Mathematical 
ay as <3 ‘ 
predictions Interpretation prerdtel S 9 La ow 


in terms of the original real-world problem to see that the solution makes 


sense. Figure 1.2 provides a picture of the major steps in the modeling 
process. 


Example 4. ‘Speed of a Canoe 


Consider another classic word problem. When Mary paddles a canoe 
up a river (against the river’s flow), the canoe goes 3 miles per hour, 
and when she paddles downstream the canoe goes 11 miles per hour. 
How fast would the canoe be going if she were paddling on a still 
lake? 

If C is the canoe’s speed in still water and R is the speed of the 
river, then algebra books model the problem with the two equations 


C+R= 11 
Ce = 3 


(3) 


This model expresses the upstream and downstream speeds as functions 
of C and R by using the intuitive physical principle that the net up- 
stream speed is the canoe speed minus the river speed, and that the 
net downstream speed is the canoe speed plus the river speed. 

Figure 1.3 has a graph of the two equations in (8). We see that 
they intersect at the point C = 7, R = 4. This solution can also be 
obtained by solving (8) algebraically. a 


Figure 1.3 Graphical solution of canoe 
problem. 
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Examples 2 and 4 are called linear models because the mathematical 
equations in the model are linear, that is, equations of lines. Examples | 
and 3 are nonlinear models because they contain nonlinear equations. A 
linear equation involves sums and differences of variables or multiples of 
variables, such as 


y = 2x — 6, y== +3 (9a) 


4x — 2y = 12 (9b) 


Note that the three preceding equations are all equivalent. 

The left sides in (8) and (9b) are called linear combinations of vari- 
ables: sums and differences of variables or of multiples of variables. A linear 
combination is the simplest type of expression of variables that one can. 
build. 

The widespread use of linear models results primarily from the ease 
of computing linear expressions as well as the existence of a powerful theory 
for analyzing linear models. Even when a problem is nonlinear, a linear 
model will often be used as an approximation. For example, for small values 
of x, sin x 1s often approximated by x. However, the reader should not 
expect that every solution has a satistactory linear model, as the next example 
shows. 


Example 5. Speed of Canoe with Sail 


Suppose that the canoe has a sail mounted at its front. The canoe is 
on a lake with a wind of W miles per hour. We assume that downwind 
speed (moving with the wind) is again C + W, the sum of the canoe’s 
speed (paddled by Mary) plus the wind’s speed. However, boats with 
sails can move upwind by an aerodynamic principle, the same principle 
that holds up airplanes. So we try a linear combination of the C and 
W of the form C + kW, where k is a constant to be determined. If U 
is the upwind speed and D the downwind speed, we obtain the equa- 
tions 


Downwind: C+W =D (10) 
Upwinds C+t+kw=U 


Let us consider a numerical example. Suppose that our downwind 
speed D is 7 and our upwind speed U is 5. We try our model—the 
equations in (10)—with k values of K = .75 and k = 1. 


k = .15: C+ W=7 


C + TSW Gn 


k=l C+W=7 
C+wWw=5 (2) 
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Figure 1.4 Equations for canoe with sail. 


Obviously, the two equations in (12) can never be satisfied simul- 
taneously. In (11) with k = .75, there are also difficulties. Subtract- 
ing the second equation in (11) from the first, we obtain .25W = 2 or 
W = 8. Since C + W=7,thenC =~ 7-W=7-8= -I1.A 
negative canoe speed is impossible (we assume that Mary is not cheat- 
ing by paddling backwards). 

A good way to see what is happening is with a graph. In Figure 
1.4, we have plotted the lines in (11). Because these two lines do not 
meet inside the positive quadrant, there is no feasible solution. If we 
change our estimate for the constant k slightly from .75 to 3, then we 
obtain a solution W = 6, C = 1, which might be close to the true 
value of W and C. 

Instead of assigning specific values to k, U and D, let us solve 
the two equations of (10) for C and W in terms of these general pa- 
rameters. If we subtract the upwind equation from the downwind equa- 
tion, we have 


C+ W=D 
C+ kW=U = D-U (13) 
(l-)W=D-U_ or Re 


If we substitute the formula for W found in (13) into the downwind 
equation, we obtain 


or 


§ Ch. I Introductory Models 


Observe that whereas originally D and U were expressed as linear 
combinations of C and W, we have now expressed C and W as linear 
combinations of D and U. However, the critical factor here is k. Recall 
that the parameter k is supposed to allow us to express the upwind 
speed with a sail as a linear combination of C and W. In (13) and (14), 
C and W depend on & in a nonlinear fashion: When k approaches |, 
the denominator | — & in (13) and (14) approaches 0, so the values 
of C and W blow up. 

This sensitivity to small changes in k near | makes this model 
inherently poor. More generally, it appears that a linear combination 
such as C + kW of C and W is unable to model properly the a age 
speed of a canoe with sail. 


Example 5 illustrates the possible inadequacy of a linear model. It also 
points out that one must always be careful about the accuracy of coefficients 
in linear equations, because the solution depends on them in a nonlinear 
fashion. A system of equations is called ill-conditioned if a large change in 
the answer can be produced by a small error in the value of a coefficient (or 
by a roundoff error in computation, such as writing 4 as .33). When k was 
near 1, the system of equations in the canoe-with-sail model was very ill- 
conditioned. In Section 3.5 we learn how to calculate the condition number 
of a system of equations, which tells how poorly conditioned a system is. 


Section 1.1] Exercises 


Summary of Exercises 

These exercises examine the five models presented in this section and ask 
the reader to make mathematical models of similar problems. Exercises I—3 
are based on Example 1; Exercises 4-8 on Example 2; Exercises 9-15 on 
Example 4; Exercises 16-19 on Example 5. All the exercises require only 
first-year high school algebra. 


1. In Example | about a falling object (dropped from the top of a 100- 
foot building), what height will the object have after | second? 


2. If the object in Example | were dropped from a 400-foot building, how 
high would it be after 2 seconds? When would it hit the ground? What 
is the relation between the time of impact in this problem and the time 
of impact in the original 100-foot problem in Example 1? Guess the 
time of impact if the object were dropped from the top of a 1600-foot 
building, and verify that this guess is right. 


3. If the object in Example 1 were dropped from a building of height H,, 
solve equation (1) for the time when the object hits the ground (your 
answer will involve Hp, and the constant 16). Make a graph plotting the 
time when the object hits the ground as a function of building height. 


4. Suppose that John is twice Mary’s age but 4 years ago he was three 
times her age. Express this information as a pair of linear equations 
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10. 


11. 


12. 


13. 


involving J (John’s current age) and M (Mary’s current age). Solve for 
J and M. 


Store A expects to sell three times as many books as store B this month. 
But next month store B’s sales are scheduled to double while store A’s 
sales stay the same. Over the 2-month period, the two stores together 
are expected to sell a total of 4500 books. How many books would each 
store sell this month? 


Suppose that Mary and Nancy are sisters and the sum of their ages 
equals their older brother Bill’s age, that Nancy is 4 years older than 
Mary, and that Bill is 6 years older than Nancy. Express this information 
in three linear equations in B (Bill’s age), M (Mary’s age), and N 
(Nancy’s age). Express B and N in terms of M, and then solve for the 
ages of the three children. 


. A rectangle is twice as high as it is wide. The sum of the height and 


width of this rectangle is equal to one-half the area of the rectangle. 
Find the height and width. 


. A company has a budget of $280,000 for computing equipment. Three 


types of equipment are available: microcomputers at $2000 each, ter- 
minals at $500 each, and word processors at $5000 each. There should 
be five times as many terminals as microcomputers and two times as 
many microcomputers as word processors. How many machines of each 
type should be purchased? 


. Suppose that the canoe in Example 4 went 5 miles per hour upstream 
_and 9 miles per hour downstream. Solve the resulting system of linear 


equations algebraically to determine the speed of the canoe (in still 
water) and the speed of the river. 


Suppose that the canoe in Example 4 goes U miles per hour upstream 
and D miles per hour downstream. Find general formulas in terms of 
U and D for the speed of the canoe (in still water) and the speed of the 
river. 


A company has $36,000 to hire a mathematician and his or her secre- 
tary. Out of respect for the mathematician’s training, the mathematician 
will be paid $8000 more than the secretary. How much will each be 
paid? 


Cook A cooks 2 steaks and 6 hamburgers in half an hour. Cook B cooks 
4 steaks and 3 hamburgers in half an hour. If there is a demand for 16 
steaks and 21 hamburgers, how many half-hour periods should cook A 
work and how many half-hour periods should cook B work to fill this 
demand? 


We have two oil refineries. Refinery A produces 20 gallons of heating 


10 


14, 


15. 


16. 


17. 


18. 


19. 


20. 
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oil and 8 gallons of diesel oi] out of each barrel of petroleum it refines. 
Refinery B produces 6 gallons of heating oil and 15 gallons of diesel 
oil out of each barrel of petroleum it refines. There is a demand of 500 
gallons of heating oil and 750 gallons of diesel oil. How many barrels 
of petroleum should be refined at each refinery to equal this demand? 


The sum of John’s weight and Sally’s weight is 20 pounds more than 
four times the difference between their two weights (John is ‘the heav- 
ier). Twice Sally’s weight is 40 pounds more than John’s weight. Write 
down these two facts in two linear equations. Simplify the first equation 
and solve the two equations to find the weights of John and Sally. 


Two ferries travel across a lake 50 miles wide. One ferry goes 5 miles 
per hour slower than the other. If the slower ferry leaves | hour before 
the faster and arrives at the opposite shore at the same time as the other 
ferry, what is the speed of the slower ferry? 


In Example 5, re-solve the downwind—upwind equations of (10) when 
k = .9 with D = 7 and U = 5. Solve again with k = .9 but now 
D = 6.5 and U = 5. Does this small change in D result in an equally 
small change in C? 


In Example 5, suppose that D = 7 and U = 5, as in equations (10) 
and (11). Then formula (14) for C becomes 


Plot C as a function of & in the interval 0 = k = 1. Is C equal to 
+ infinity or equal to —infinity when k = 1? Explain your answer. 


In the downwind—upwind system of equations in Example 5, suppose 
that Mary were not paddling, so that C = 0. If k = .75 and U (upwind 
speed) = 6, what is W and what is D? In this case, what must be the 
relation between U and D? (That is, write U as a function of D.) 


In the system of equations for our canoe-with-sail model, 


C+kw= 8 
C+ W= 12 


pick the unknown k so that C = O when this system is solved. What 
is W? 


A refinery produces 8 gallons of gasoline and 6 gallons of heating oil 
from each barrel of petroleum it refines. There is a demand for 400 
gallons of gasoline and 200 gallons of heating oil. Can you set up a 
linear equation to determine how many barrels of petroleum should be 
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refined? Explain your answer and tell how many barrels you would 
refine if you were the manager of the refinery. 


21. Suppose we estimate that a child’s IQ is the average of the parents’ IQs 
and that the income the child has when grown to the age of 40 is 200 
times the father’s IQ plus 100 times the mother’s IQ. If the child has 
an IQ of 120 and earns $48,000 at age 40, what are the parents’ IQs? 
Does this model give a reasonable answer? 


22. Suppose that factory A produces 12 tables and 6 chairs an hour while 
factory B produces 8 tables and 4 chairs an hour. How many hours 
should each factory work to produce 48 tables and 24 chairs? How 
many different solutions are there to this problem? 


23. We estimate that Jack can do 3 chemistry problems and 6 math problems 
in an hour, while Paula can do 4 chemistry problems and 7 math prob- 
lems in an hour. There are 11 chemistry problems and 17 math prob- 
lems. Set up and solve a system of two linear equations to determine 
how long Jack and Paula should work to do these problems. What is 
the matter with the solution? Propose a solution that makes more sense. 


~ Section 1.2 Systems of Linear Equations 


In Section 1.1 we gave a quadratic formula H = —167* + H, to model 
mathematically the height of an object dropped from a building H, feet tall. 
From this formula we could determine how long it took for an object to hit 
the ground (see Example | of Section 1.1). The model involved a single 
nonlinear equation with one variable to be determined. Nonlinear equations 
in one variable are not difficult to derive and solve. 

When many variables need to be determined, then almost surely the 
mathematical model will be a system of linear equations. There are four 
basic reasons for using linear models. 


1. There is a rich theory for analyzing and solving systems of linear equa- 
tions. There is limited theory and no general solution techniques for 
systems of nonlinear equations. 

2. Systems of nonlinear equations involving several variables exhibit very 
complex behavior and we rarely understand real-world phenomena well 
enough to use such complex models. 

3. Small changes in coefficients in nonlinear systems can cause huge 
changes in the behavior of the systems, yet precise values for these 
coefficients are rarely known. 

4. All nonlinear phenomena are approximately linear over small intervals; 
that is, a complicated curve can be approximated by a collection of many 
short line segments (see Figure 1.5). 
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Figure 1.5 Line segments ap- 
proximating a curve. 


For these reasons we have the following general principle that is true 


for numerical problems in all fields: physics, statistics, economics, for in- 
Stance. 


General Mathematical Principle for Multivariable Problems. Any 
problem involving several unknowns is normally solved by recasting 
the problem as a system of linear equations. 


The two canoe problems, with and without a sail, in Section |.1 were 
simple examples of systems of linear equations. Here are two more exam- 
ples. These examples, together with those in Section 1.3, are used dozens 
of times throughout the book to motivate and illustrate theory and numerical 
methods. It is very important for the reader to gain familiarity with these 


examples through working some of the numerical exercises at the end of 
each section. 


Example I. Oil Refinery Model 


A company runs three oil refineries. Each refinery produces three pe- 
troleum-based products: heating oil, diesel oil, and gasoline. Suppose 
that from | barrel of petroleum, the first refinery produces 20 gallons 
of heating oil, 10 gallons of diesel oil, and 5 gallons of gasoline. The 
second and third refineries produce different amounts of these three 
products as described in the following table. 


Refinery | Refinery 2 Refinery 3 


Heating oil: 20 4 4 
Diesel oil: 10 14 5 (1) 
Gasoline: 5 , 12 
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Let x; be the number of barrels of petroleum used by the ith refinery. 
Then the total amount of each product produced by the refineries is 
given by the linear expressions 


Heating oil: 20x, + 4x, + 4x, 
Diesel oil: 10x, + 14x, + 5x, (2) 
Gasoline: DX; F VOR 2a 


Suppose that the demand is 500 units of heating oil, 850 units of diesel 
oil, and 1000 units of gasoline. What values x,, x, x; are needed to 
produce these amounts? We require the x; to satisfy the following 
system of linear equations. 


20x, + 4x, + 4%, = 500 
10x, + 14x, + 5x, 850 (3) 
5x, + 5x, + 12x, = 1000 


Later in this book we shall learn several ways to solve systems 
of linear equations. For now, let us use the tried-and-true method of 
trial and error. As an initial guess, try x, = 25, x, = 25, x, = 25. 
Using these values in (3), we get 


20(25) + 4(25) + 4(25) = 700 
10(25) + 14(25) + 5(25) = 725 (4) 
5(25) + 3(25) + 12(25) = 550 


(It is helpful to have a programmable calculator or microcomputer for 
calculations with systems of linear equations.) 

We need to alter the x, values to make the first expression (heating 
oil) smaller and the last expression (gasoline) larger. Since x, makes. 
the largest contribution to the first expression and x, makes the largest 
contribution to the last expression, we decrease x, and increase x;. 
Suppose that we try x, = 10, x, = 25, x, = 70. 


200110) + 4(25) + 4(70) = 580 
10(10)+ 14(25) + 5(70) = 800 (5) 
S(10) + 5(25) + 12(70) = 1015 


Although these x values are much better than the original ones, let us 
try to do better. We need to decrease the first expression and increase 
the second expression (without changing the third expression). To de- 
crease the first expression, we should decrease x,. Similarly, to in- 
crease the second expression, we increase x,. Trying the following 
values, we obtain the result that production levels 
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x, = 5, x, = 35, x, = 70 
yield 
heating oil = 520, diesel oil = 890, gasoline = 1040 
This is getting quite close to our production goals. The overproduction 
of 20 to 40 gallons in each product might be a reasonable safety margin 
that is actually desirable. 
To get closer, we should decrease x, and x, a little. 
x, = 5, x, = 33, x, = 68 
yield 
heating oil = 504, diesel oil = 852, gasoline = 1006 
This is an excellent fit. We have been a bit lucky. To do better, we 


would probably have to use fractional values. (In the Exercises the 
reader may need more tries to get this close.) a 


We next consider a slightly more complicated supply—demand model. 
This model has the balancing advantage that trial-and-error calculations to 
estimate a solution are easier. The reader is warned that it takes a little while 
to get a feel for all the numbers in this model. 


Example 2. A Model of General Economic 
Supply—Demand 


We present a linear model due to W. Leontief, a Nobel Prize—winning 
economist. The model seeks to balance supply and demand throughout 
a whole economy. For each industry, there will be one supply—demand 
equation. In practical applications, Leontief economic models can have 
hundreds or thousands of specific industries. We consider an example 
with four industries. 

The left-hand side of each equation is the supply, the amount 
produced by the ith industry. Call this quantity x;; it is measured in 
dollars. On the right-hand side, we have the demand for the product 
of the ith industry. There are two parts to the demand. The first part 
is demand for the output by other industries (to create other products 
requires some of this product as input). The second part is consumer 
demand for the product. 

For a concrete instance, let us consider an economy of four gen- 
eral industries: energy, construction, transportation, and steel. Suppose 
that the supply—demand equations are 
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Industrial Demands 


SSCS 
Supply Energy Construct. Transport. Steel Demand 
Ehereys 22 34K Ae FF ie. Sa oF 100 

Construct.: x, = .3x, + aa 2x, , OF At 50 

Transport.: x, = .Ilx, + i> =F 2 ees A 100 

Steel: x, = BY? Gee ot LX; + 0 


(6) 


The first equation, for energy, has the supply of energy x, on the 
left. The terms on the right of this equation are the various demands 
that this supply must meet. The first term on the right, .4x,, is the 
input of energy required to produce our x, dollars of energy (.4 units 
of energy input for one unit of energy output). Also, the second term 
of .2x, is the input of energy needed to make x, dollars of construction. 
Similarly, terms .2x, and .2x, are energy inputs required for transpor- 
tation and steel production. The final term of 100 is the fixed consumer 
demand. 

Each column gives the set of input demands of an industry. For 
example, the third column tells us that to produce the x, dollars of 
transportation requires as input .2x, dollars of energy, .2x, dollars of 
construction, and .1x, dollars of steel. In the previous refinery model, 
the demand for each product was a single constant quantity. In the 
Leontief model, there are many unknown demands that each industry’s 
output must satisfy. There is an ultimate consumer demand for each 
output, but to meet this demand industries generate input demands on 
each other. Thus the demands are highly interrelated: Demand for 
energy depends on the production levels of other industries, and these 
production levels depend in turn on the demand for their outputs by 
other industries, and so on. 

When the levels of industrial output satisfy these supply—demand 
equations, economists say that the economy is in equilibrium. 

As in the refinery model, let us try to solve this system of equa- 
tions by trial-and-error. As a first guess, let us set the production levels 
at twice the consumer demand (the doubling tries to account for the 
interindustry demands). So x, = 200, x, = 100, x; = 200, and 
X, = O; these are our supplies. Given these production levels, we can 
compute the demands from (6). 


Supply Demand 
Energy: 200 .4(200) + .2(100) + .2(200) + .2(0) + 100 = 240 
Construct.: 100 .3(200) + .3(100) + .2(200) + .1(0) + 50 = 180 
Transport.: 200 .1(200) + .1(100) + + .2(0) + 100 = 130 
Steel: 0 + .1(100) + .1(200) = 30 


(7) 
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For our next approximations, let us try supply levels halfway 
between the supply and demand values in (7). That is, x, = 
3(200 + 240) = 220, and similarly, x, = 140, x, = 165, and 


Xx, = 15. 
Supply Demand 
Energy: 220 4(220) + .2(140) + .2(165) + .2(15) + 100 = 252 
Construct.: 140 .3(220) + .3(140) + .2(165) + .1(115) + 50 = 192.5 
Transport.: 165 .1(220) + .1(140) + + .2(15) + 100 = 139 
Steel: 15 + .1(140) + .1(165) = 30.5 
(8) 


The second approximation is only moderately better. The interaction 
effects between different industries are hard to predict. Adjusting pro- 
duction levels was much easier in the refinery problem, where the 
demand for each product was constant. 

Let us stop trying to be clever and just use the simple-minded 
approach of setting production levels (i.e., supply levels) equal to the 
previous demand levels. So from (8), we try 


Supply — Demand 


Energy: 252 4(252) + .2(192) + .2(139)-+ .2(30) + 100 = 273 


Construct.: 192 .3(252) + .3(192) + .2(139) + .1(30) + 50 = 214 
Transport.: 139 .1(252) + .1(192) + + .2(30) + 100 = 150 
Steel: 30 + .1(192) + .1(139) = 33 

(9) 


The demand values here have been rounded to whole numbers. The 
supplies and demands are getting a little closer together in (9). Re- 
peating the process of setting the new supply levels equal to the pre- 
vious demand levels (i.e., the demands on the right side in (9)) yields 


Supply Demand 


Energy: 273 .4(273) + .2(214) + .2(150) + .2(33) + 100 = 289 
Construct.: 214 .3(273) + .3(214) + .2(150) + .1G63) + 50 = 229 


Transport.: 150 .1(273) + .1(214) + + .2(33) + 100 = 155 
Steel: 33 + .1(214) + .1(150) = 36 
(10) 


Repeating this process again, we have 
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Supply Demand 


Energy: 289 .4(289) + .2(229) + .2(155) + .2(36) + 100 = 300 
Construct.: 229 .3(289) + .3(229) + .2(155) + .1(36) + 50 = 240 
Transport.: 155 .1(289) + .1(229) + + .2(36) + 100 159 

Steel: 36 + .1(229) + .1€155) = 38 


(11) 


Observe that in successive rounds (9), (10), (11), supplies are 
rising. This is because as we produce more, we need more input which 
requires us to produce still more, and so on. It may be that this iteration 
will go on forever, and no equilibrium exists. On the other hand, the 
gap between supplies and demands is decreasing. 

Leontief proposed a constraint on the input costs that we shall 
show (in Section 3.4) guarantees that an equilibrium exists. The con- 
straint is 


Input Constraint. Every industry is profitable: Every industry must 
require less than $1 of inputs to produce $1 of output. 


In mathematical terms, this means that the sum of the coefficients 
in each column must be less than |. Our data in (6) satisfy this con- 
straint, so an equilibrium does exist for this four-industry economy. 
Moreover, the iteration process of repeatedly setting production levels 
equal to the previous demands ‘will converge to this equilibrium. The 
reader should check that the following numbers are equilibrium values 
(rounded to the nearest integer). 


Equilibrium: energy = 325, construction = 265, 
transportation = 168, steel = 43 a 


Note that any system of linear equations can be rewritten in the form 
of supply—demand equations with x; appearing alone on the left side of the 
ith equation, as in the Leontief supply-demand model (6). It is standard 
practice to solve large systems of linear equations by some sort of iterative 
method. The nature of the supply-demand equations suggested the iterative 
scheme we used here, letting the demands from one round be the production 
levels of the next round. 


Section 1.2 Exercises 


Summary of Exercises 

Exercises 1—5 are based on the refinery model. Exercises 6-8 are other 
problems involving a system of three linear equations in three unknowns. 
Exercises 9—12 are based on the Leontief economic model. Exercises 13 and 


14 involve converting the refinery problem into a system of Leontief-type 
equations. 
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Exercises 1—3 refer to the refinery model in Example 1. 


1. Suppose that refinery 1 processes 15 barrels of petroleum, refinery 2 
processes 20 barrels, and refinery 3 processes 60 barrels. With this 
production schedule, for which product does production deviate the 
most from the set of demands 500, 850, 1000? 


2. Suppose that the demand for heating oil grows to 800 gallons, while 
other demands stay the same. Find production levels of the three refi- 
neries to meet approximately this new set of demands (by ‘‘approxi- 
mately’’ we mean with no product off by more than 30 gallons). 


3. Suppose that refinery 3 is improved so that each barrel of petroleum 
yields 8 gallons of heating oil, 10 gallons of diesel oil, and 20 gallons 
of gasoline. Find production levels of the three refineries to meet ap- 
proximately the demands (by ‘‘approximately’’ we mean with no prod- 
uct off by more than 30 gallons). 


4. Consider the following refinery model. There are three refineries 1, 2, 
and 3 and from each barrel of crude petroleum, the different refineries 
produce the following amounts (measured in gallons) of heating oil, 
diesel oil, and gasoline. 


Refinery 1 Refinery 2 Refinery 3 


Heating oil 6 3 2 
Diesel oil 4 6 3 
Gasoline 3 2 6 


Suppose that we have the following demand: 


280 gallons of heating oil, 
350 gallons of diesel oil, and 
350 gallons of gasoline. 


(a) Write a system of equations whose solution would determine pro- 
duction levels to yield the desired amounts of heating oil, diesel 
oil, and gasoline. As in Example 1, let x; be the number of barrels 
processed by the ith refinery. 

(b) Find an approximate solution to this system of equations with no 
product off by more than 30 gallons from its demand. 


5. Repeat the refinery model in Exercise 4 with new demand levels of 500 
gallons heating oil, 300 gallons diesel oil, and 600 gallons gasoline. 
Try to find an approximate solution (within 30 gallons) with this set of 
demands. Something is going wrong and there is no valid set of pro- 
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duction levels to attain this set of demands. What is invalid about the 
solution to this refinery problem? 


Extra Credit: Try to explain in words why this set of demands is 
unattained while the demands in Exercise 4 were attainable. 


6. The staff dietician at the California Institute of Trigonometry has to 
make up a meal with 600 calories, 20 grams of protein, and 200 mil- 
ligrams of vitamin C. There are three food types to choose from: rubbery 
jello, dried fish sticks, and mystery meat. They have the following 
nutritional content per ounce. 


Jello Fish Sticks Mystery Meat 


Calories 10 50 200 
Protein ] 3 2 
Vitamin C 30 10 0 


(a) Make a mathematical model of the dietician’s problem with a sys- 
tem of three linear equations. 
(b) Find an approximate solution (accurate to within 10%). 


7. A furniture manufacturer makes tables, chairs, and sofas. In one month, 
the company has available 300 units of wood, 350 units of labor, and 
225 units of upholstery. The manufacturer wants a production schedule 
for the month that uses all of these resources. The different products 
require the following amounts of the resources. 


Table Chair Sofa 


Wood 4 | 3 
Labor 3 
Upholstery 2 0 4 


(a) Make a mathematical model of this production problem. 
(b) Find an approximate solution (accurate to within 10%). 


8. A company has a budget of $280,000 for computing equipment. Three 
types of equipment are available: microcomputers at $2000 a piece, 
terminals at $500 a piece, and word processors at $5000 a piece. There 
should be five times as many terminals as microcomputers and two 
times as many microcomputers as word processors. Set this problem up 
as a system of three linear equations. Determine approximately how 


many machines of each type there should be by solving by trial-and- 
error. 
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Note: Check your answer by expressing the numbers of terminals and 
microcomputers in terms of the number of word processors and solving 
the remaining single equation in one unknown. 


Exercises 9-11 are based on the Leontief model in Example 2. 


9. 


10. 


il. 


12. 


If we produced 300 units of energy, 250 units of construction, 160 units 
of transportation, and 40 units of steel, what would be the largest de- 
viation between supply and demand among the four commodities? 


Start the iteration procedure followed in (9), (10), and (11) with an 
initial set of supplies equal to the consumer demands, that is, x, = 
100, x, = 50, x; = 100, x, = 0. Compute the right sides of the 
equations in (6) with this set of x,’s and let the resulting numbers be 
the new values for x,, x5, x3, X4; compute the right sides again with 
these new x,s; and so on. Do this iteration five times. Do the successive 


sets of x,s appear to be converging toward the equilibrium values given 
at the end of Example 2? 


This exercise explores the effect on all industries of changes in one 
industry. Quadrupling the price of petroleum had a widespread effect 
on all industrial sectors in the 1970s. But smaller changes in one seem- 
ingly unimportant industry can also result in important changes in many 
other industries. 

(a) Change the system of equations in the Leontief model in (6) by 
decreasing the coefficient of x, (energy) in the construction equation 
from .3 to .2 (this is the result of new energy efficiencies in con- 
struction equipment). We want to know how this change affects 
our economy. Iterate five times, as in Exercise 10, with this altered 
system using as starting x,’s the equilibrium values for the original 
model: x, = 325, x, = 265, x, = 168, x, = 43. 

(b) Repeat part (a), but now decrease the coefficient of x, (steel) in the 
transportation equation from:.2 to .1. 

(c) Repeat part (a), but now increase the coefficient of x, (construction) 
in the energy equation from .2 to .3. 


Consider the Leontief system 


x) = 4%, + -3x5'+ 3x +100 
OX, + 4% + 3x, + 100 
3X, + .3X%. + .4x5 + 100 


X% 


x3 
Here the column sums are 1, violating the Leontief input constraint 
given in the text. Show that this system cannot have a solution. 
Hint: Add the three equations together. 


Extra Credit: Try to explain in economic terms why no solution exists. 
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13. (a) Rewrite the refinery model’s equations in (3) to look like Leontief 
equations as follows. Divide the first equation by 20 (the coefficient 
of x, in that equation), divide the second equation by 14, and divide 
the third equation by 12. Next move the x, and x; terms in the new 
first equation over to the right side, leaving just x, on the left (the 
new equation should be x, = —.2x, — .2x, + 25); similarly, in 
the second equation leave just x, on the left and in the third equation 
leave just x, on the left. Note that the Leontief input constraint 
about column sums is not Satisfied by this system. 

(b} Use the iteration method introduced in equations (9), (10), and (11) 
(see also Exercise 10) to get an approximate solution to the refinery 
problem (do five iterations, starting with the ‘‘consumer demands’’ 
of 25, 50, 100). 


14. (a) Rewrite the refinery model’s equations in (3) to look somewhat like 
Leontief equations as follows. In the first equation, move the x, 
and x, terms to the right side and also move 19 of the 20 units of 
the x, term to the right, leaving just x, on the left (the equation is 
now x, = —19x, — 4x, — 4x, + 500). Similarly, in the second 
equation leave just x, on the left; move everything else to the nght 
side. In the third equation leave just x, on the left. Note that Leon- 
tief’s input constraint about column sums is far from satisfied by 
this system. 

(b) Try using the iteration method introduced in equations (9), (10), 
and (11) (see also Exercise 10) to get an approximate solution to 
the refinery problem (do five iterations starting with the guess of 
x, = 25, x, = 50, x; = 100). Does the iteration process seem to 
be converging? 


m10n 7.9 Markov Chains and 
Dynamic Models 


The refinery and economic supply—demand models of Section 1.2 were static 
in the sense that we solved them once and that was it. There was one set of 
production levels required, not a sequence of levels that would be needed 
to describe an economy changing over time. A model that tries to predict 
the behavior of a system over a period of time is called a dynamic model. 
In this section we examine two dynamic linear models. 

The first dynamic model we consider involves probability. This model, 
called a Markov chain, will arise over and over in this book, so it is important 
to understand the model well. The concepts of probability we need for this 
model are simple and intuitive. 

A Markov chain is a probabilistic model that describes the random 
movement over time of some activity. At each period of time, the activity 
is in one of several possible states. States might be amounts won in gam- 
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bling, different weather conditions (e.g., sunny, snowy), or numbers of jobs 
waiting to be processed by a computer. The model specifies probabilities 
that tell how the activity changes states from period to period. Consider a 
simple two-state Markov chain. 


Example f, Markov Chain for Weather 


Suppose that we have two states of the weather: sunny or cloudy. If 
it is sunny today, the probability is } that it will be sunny tomorrow, 
and ; that it will be cloudy tomorrow. If it is cloudy today, then the 
probability is 4 that it will be sunny tomorrow and 4 that it will be 
cloudy tomorrow. It is convenient to display these probabilities in an 
array. 


Today 
Sunny Cloudy 


Sunny 3 3 
Cloudy } $ a 


Tomorrow 


The probabilities in this array are called transition probabilities, and 
the array is called a transition matrix. The probabilities in each column of 
the transition matrix must add up to 1. A convenient way to display the 
information in a Markov chain is with a transition diagram. The diagram for 
the weather Markov chain is drawn in Figure 1.6. There is a node for each 
state and arrows between states. Beside the arrow from state A to state B 
we write the transition probability of going from state A to state B. 

The transition probabilities of a Markov chain tell us the chances of 
being in different states one period later. We need a formula from probability 
theory to be able to calculate the probabilities of where an activity will be 
after several periods. As in weather forecasting, it is predictions manv ne- 
riods into the future that are most interesting. 

To state this probability formula, we need to introduce some notauon. 
Let p;, Po, - - -, Pp, be the probabilities of being in state 1, state 2, .. ., 
state n. This set of probabilities is called a probability distribution. Let a;; 
be the transition probability of going from state j to state i. In the weather 
Markov chain, if state | is sunny and state 2 is cloudy, the transition prob- 
abilities are 


ie 8 23, = ‘ * Weather 


a, = % Gy Markov chain 


| 
role 


Figure 1.6 I 
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Formula for Distribution of Next States in Markov Chain. lif 


P\; Po» - - +» P, 18 the current probability distribution of the activity, 
then the probability distribution p}, p5, . . ., p,, for the next period is 
given by 
P, = 4p, + ,2P2 + ayzp3 + *** + a,,P, 
PP = AP, + Ay2P2 + Ap3p3 + °° * + Aa Pp 
P3 = 43)P,; + Q32P2 + G33p3 + °° * + Gz,P, (1) 
Pr ~ 4,iP\ + a,2P2 7 4,3P3 ee eae QnnP n 
To illustrate this formula with the weather Markov chain, let p, = 3 


and p, = 4. Then tomorrow’s distribution p}, p} is 


Py = 4p; + Qypp2 =4°9+3°F = T8 (2) 
P, = yp, + Qynp. = 4-9 +3°3 


We explain the formulas in (2) intuitively as follows. We can be in 
state | (sunny) tomorrow either because we are in state | today and then 
stay in state 1—this is the probability a,,p,— or because we are in state 2 
today and then switch to state 1—this is the probability a,.p,. [To compute 
the probability of a sequence of two events, such as (i) the probability p, of 
now being in state 2, and (ii) the probability a,, of switching from state 2 
to state 1, we multiply these two probabilities together, to get a,,p5.| 

Let us next consider a larger Markov chain that models the action of 
a popular video arcade game called Frogger. 


Example Ps Frogger Markov Chain 


We model the behavior of a frog jumping around on a four-lane high- 
way. The possible states range from 1 = left side of highway, to 
6 = right side of highway. See Figure 1.7a. Suppose that the following 
array gives the transition probabilities that the frog, if now in state /, 
will be in state i one minute later. The transition diagram for this 
Markov chain is given in Figure 1.7b. 


State in Current Period 


1 2 3 + *. 6 

State I 5 25 0 0 0 0 

in 2 J 5 25 0 0 0 

Next 3 0 25 wa 25 0 0 
Period 4 0 0 25 He 25 0 ©) 

5 0 0 0 25 5 a, 

6 0 0 0 0 25 a 
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| 
2 | 3 4 
| 
| 
| 
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(a) 


Figure 1.7 (a) Six different states 
for frogs on four-lane highway. 

(b) Transition diagram for Markov 
chain in Example 2; there is a node 


for each state. The number beside 
the edge from state j to state iiSPy Left Right 
(the probability of going from state curb curb 


j to state i). 


| 
| 
| 
| 5 6 
| 
| 
| 
| 


The formulas for next-period probabilities with the frog Markov 
chain are 


Pp; = -50p, + .25p, 
p, = .S0p, + .50p, + 25p, 


ps = ‘25p, + .30p, + .25n5 (4) 
Pp, = 2505 + 50D, + .25p; 

Ps — .25P4 +- 0p s + 0p. 

Pe = 2p; + .S0p, 


Suppose that the frog starts in state | (left side of highway). Then 
its probability distribution after 1 minute is given by the probabilities 
in the first column of (4), since initially p, = 1 and other p; = 0: 


Pp, = .5, Pp, = .5, other p,; = 0 (5) 


Let us use (4) to compute the probability distribution for the frog after 
2 minutes (remember that only p, and p, are nonzero): 


3 eS 4 2X OS = 3S 
SMS ee eR 
OG. + 25%.5°= 125 


py = Sp, + .23p, 
P2 = Sp, + .Spr 


| 
ws 


(6) 


Other p; = 0. Note that the sum of the probabilities in (6) equals 1, 
as it should. 

We can continue iterating with formula (4) to find the distribution 
after 3 minutes, after 4 minutes, and so on (it helps to let a computer 
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Table 1.1 
State 
Minutes 1 2 3 4 5 6 
0) l 0 0 0 0 0 
| 50 50 0 0 Q) 0 
2 375 50 .125 0 () 0 
3 312 469 188 031 0 0 
4 273 438 219 .063 007 0 
5 246 410 234 O88 020 O02 
6 226 387 242 122 046 O11 
10 176 320 24] 150 O80 030 
i) 144 272 .226 172 128 056 
20 126 244 217 183 157 073 
25 116 226 210 190 173 O84 
100 a 2 2 | 
200 l l 
1000 l 2 2 l 


do this). Table 1.1 gives these probabilities, assuming that the frog 
started in state 1 (left side of highway). 

Observe that the probabilities converge to the distribution 
Pp, = .1, po = .2, ps = .2, py = .2, ps = .2, Pg = «1 (and then 
stay the same forever). Very interesting! 

Would this long-term distribution evolve from any starting dis- 
tribution? Do all Markov chains exhibit this type of long-term distri- 
bution? Can the long-term distribution be computed more simply than 
iterating the equations in (4) 100 times? (Answer: Definitely yes.) & 


Markov chains are a very useful type of linear model—tinear because 
formula (1) for next-state probabilities is a system of linear equations. Part 
of their usefulness is due to results in matrix algebra that provide simple 
answers to all the questions just posed and many more. 

Next we look at a simpler dynamic model, an ecological model that 
traces the sizes of populations of rabbits and foxes. The simplicity of the 
model allows us to experiment more, changing the values of the coefficients 
to exhibit a variety of different long-term trends. 


PPTL 
Example 3. Growth Model for Rabbits 
and Foxes Model 


Consider the following model for the monthly growth of populations 
of foxes and rabbits. If R and F are the numbers of rabbits and foxes 
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this month, let R’ and F’ be the numbers next month given in the 
equations 


R' =R+ dR - eF (7) 
F' = F — dF + e’R 


where b is the birthrate of rabbits, d the death rate of foxes, and e and 
e’ are eating rates. The term -eF in the rabbit equation is negative and 
the term + e’R in the fox equation is positive because foxes eat rabbits. 
The results R’, F’ after 1 month can become new values for R and F 
to project the populations 2 months hence, and so on, as we did in the 
frog Markov chain. 

Normally, the (positive) constants would be estimated for us by 
ecologists. But let us make up some reasonable-sounding values for 
these constants and see what sort of behavior this model predicts. 
Suppose that we try 


R' = R+ .2R — .3F (8) 
F'’ =F — .1F + .1R 


and start with R = 100, F = 100. Then using (8) repeatedly to 
compute the populations in successive months, we get 


Q months: 100 rabbits, 100 foxes 
| month: 90 rabbits, 100 foxes 
2 months: 78 rabbits, 99 foxes 
3 months: 64 rabbits, 97 foxes 
4 months: 48 rabbits, 94 foxes ) 
5 months: 29 rabbits, 89 foxes 
6 months: 8 rabbits, 83 foxes 


7 months: —15 rabbits, 76 foxes 


A negative number means that the rabbits became extinct. 
If there are no rabbits, then the fox equation in (8) becomes 


F' =F — 1JF—-F = OF 


and the foxes will eventually die out, too. This behavior of foxes killing 
off the rabbits and then starving to death is reasonable. 

Let us try new starting values for R and F that will allow the 
rabbits to increase in size. The term +.2R — .3F in the rabbit equation 
is the amount the rabbit population changes from this month to the 
next. For this term to be positive we require .2R to be more than .3F. 
Suppose that we choose R = 100, F = 50: 


0 months: 100 rabbits, 50 foxes 
| month: 105 rabbits, 55 foxes 
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2 months: 109 rabbits, 60 foxes 
3 months: 113 rabbits, 65 foxes 
4 months: 116 rabbits, 70 foxes 
5 months: 119 rabbits, 75 foxes 
6 months: 120 rabbits, 79 foxes 
7 months: 121 rabbits, 83 foxes (10) 


8 months: 120 rabbits, 87 foxes 
9 months: 118 rabbits, 90 foxes 
10 months: 115 rabbits, 93 foxes 
11 months: 110 rabbits, 95 foxes 
12 months: 103 rabbits, 97 foxes 
13 months: 94 rabbits, 97 foxes 
14 months: 84 rabbits, 97 foxes 
15 months: 72 rabbits, 96 foxes 


While the rabbits increased initially, so did the foxes (since they fed 
off the rabbits). After 7 months there were enough foxes so that they 
were eating rabbits faster than new rabbits were being born and the 
rabbit population began to decline. After 13 months (when there are 
fewer rabbits than foxes), the foxes begin to decline, also. Now we 
are in the same situation as before, in (9). 

It appears that the equations in our model (8) make it inevitable 
that the foxes will grow to a level where they eat rabbits faster than 
rabbits are born, causing the rabbits to decline to extinction. Then the 
foxes become extinct, too. The reader is asked in the Exercises to try 
other values for b, d, e, and e’ in this model and to explore the resulting 
behavior. 

Akin to the stable probability distribution in the frog Markov 
chain, let us try to determine values of the coefficients in our model 
that will permit the rabbit and fox populations to stabilize, that is, to 
remain the same forever. This means that R' = Rand F’ = F. — 

We return to the original general model 


R' = R + bR —- eF (11) 
F’ = F — dF + eR 


When R’ = R and F’ = F, we have 


R = R + OR — eF 
F = F — dF + e’R 


Or 


bR — eF = 0 (12) 
e'R — dF =0 


28 


Ch. 1 Introductory Models 


Note that in (12), the order of the terms + e’R and — dF in the second 
equation was reversed. Let us solve this pair of linear equations. for R 
and F. Obviously, R = F = 0 1s a solution. But we want another 
solution. We use the standard method for eliminating one of the vari- 
ables in (12): Multiply the first equation by d and the second by e and 
then subtract the second from the first. 


dbR — deF = 0 
—(ee’R — edF = 0) 
(bd — ee')R = 0 


If bd — ee’ = O, then R (and F) need not be 0. Note that | 
d 
bd — ee’ =O bd = ee o> = = (13) 


e’ 


Suppose that e/b = d/e’. Then one can show that R and F are solutions 


to (11) if and only if 
e d 


R' =R+ AR — .15F (15) 
F' = F — SF + .1R 


Is 


For example, the system 


has stable values R = 15, F = 10 or R = 6, F = 4. In fact, any 
pair (R, F) is stable in (15) if 


Kk... .%3 3 
Stable R, F values for (15): F = > or R= 5 F (16) 


Further, if we start with values for R and F that are not stable, then 
over successive months the rabbit and fox populations always move 
toward one of these stable pairs of values, just like the Markov chain. 
In this model the ‘‘law of nature’’ is that there should be a 3 : 2 ratio 
of rabbits to foxes. Figure 1.8 shows sample curves along which un- 
stable values move in approaching a stable value. For example, if we 
start iterating (15) with R = 50, F = 40, we have 


0 months: 50 __— rabbits, 40 =foxes 
| month: 49 rabbits, 39 foxes 
2 months: 48 rabbits, 38 foxes 
3 months: 47 rabbits, 37 foxes 
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Foxes 


(15, 10) 


16 Rabbits 


Figure 1.8 Stable values and trajectories to stable values in rabbit—fox 
growth model. 


10 months: 42 rabbits, 32 ~=foxes 

20 months: 37 «rabbits, 27 ~foxes 
. . . (17) 

30 months: 34 — rabbits, 24 foxes 

40 months: 32.5 rabbits, 22.5 foxes 

50 months: 31.5 rabbits, 21.5 foxes 


75 months: 30.4 rabbits, 20.4 foxes 


100 months: 30.1 rabbits, 20.1 foxes 
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Clearly, the populations are approaching the stable sizes of 30 rabbits 
and 20 foxes. 

The starting pair (100, 80) goes to (60, 40); (80, 30) goes to 
(150, 100). If we write the starting values in the form R = r, F = 
r — k, then the limiting stable values will always turn out to be 
R = 3k, F = 2k. If the starting values have R < F, the limiting values 
will both be negative, with rabbits becoming negative (extinct) first. 
If R = F, both populations approach 0. 

Why does this happen? How much of this behavior would occur 
if we used other values for the constants b, d, e, e’ that satisfied the 
condition e/b = d/e'? m 


We conclude this section by introducing a nonlinear model for rabbit 
and fox populations. This nonlinear model can simulate a cyclic behavior 
that occurs frequently in nature. 


eases 
Example 4. Nonlinear Model for 
Rabbits and Foxes 


Let us consider the following nonlinear model for the monthly growth 
of populations of foxes and rabbits. Again, if R and F are the numbers 
of rabbits and foxes this month, then R’ and F' are the numbers next 
month. Our system of equations is 


R' = R + bR — eRF (18) 
F' = F — dF + e'RF 


We now use the terms —eRF and + e’RF for the effect of foxes eating 
rabbits because the chances of a fox catching a rabbit depend on the 
abundance of both species. 

Let us suppose that b = d = .l ande = e’ = Ol. 


R' = R + .IR — .O1RF (19) 
F' = F — .1F + .O1RF 


If initially we let R = 30 and F = 30, then using (19) we obtain 
the table whose values are plotted in Figure 1.9. 


Number of Months Rabbits Foxes 
l 30 - 30 
2 32 28 
3 34 26 
5 40 23 
10 58 17 


15 87 Pe) 
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31 
Foxes 
40 
38 45 
36 
34 
32 
30 
28 50 
40 
26 
24 
22 
20 
55 
18 
16 
14 
12 
60 
10 35 
8 
Pal i 
TVests 10 45 20 25 - 
2 80 
= 100 
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 Rabbits 
Figure 1.9 Cyclic pattern of population sizes in nonlinear rabbit—fox 
growth model. 
20 132 15 
25 196 20 
30 282 36 20 
35 354 96 ) 
40 26] 278 
45 62 389 
50 LS 280 
60 6 107 
70 7 40 
80 15 iS 
90 35 7 
100 88 4 


When we started with a small number of both rabbits and foxes, 
the fox population declined further for a few months (from the -dF 
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term) while the rabbit population started growing (from the, + bR term); 
the —eRF and +e’RF terms have little effect because the constants e 
and e’ are so small. Soon the fox population grows and the rabbit 
population stops growing and starts to decline as the foxes eat the 
rabbits (the terms —eRF and +e'RF come into effect when R and F 
are large). As the rabbit population declines, so the fox population 
soon declines because of less food. We eventually find the sizes of the 
two populations back at the levels at which we started. The cycle time 
in this model is about 80 months. 

This cycle of sudden growth followed by sudden decline char- 
acterizes the behavior of periodic pests, such as gypsy moths. The 
moths are dormant for many years and then have sudden, major out- 
breaks when few of their natural enemies are present (for gypsy moths, 
the enemy is a parasitic wasp). Eventually, the large numbers of the 
pest stimulate the appearance of its predator. The moths are killed off 
by the wasps and then the wasps die for lack of food. The dormant 
period begins again. 

Linear models cannot produce this type of behavior. ba] 


Note that after one cycle (80 months) the population sizes in (20) are 


15 rabbits and 15 foxes, half the starting sizes of 30. This slippage in size 
is a fault in the model, due to the fact that we rounded time into units of 
months. If time were measured in days or seconds, the slippage would be 
less. Exercise 21 describes how to convert this model into units of days or 
seconds. 


Section 1.3 Exercises 


Summary of Exercises 

Exercises 1-11 involve Markov chain models. Exercises 12—19 examine the 
model in Example 3 and similar linear growth models. Exercises 20 and 21 
examine the behavior of the nonlinear model in Example 4. Exercises 11, 
20, and 21 require computer programs. | 


1. Using the weather Markov chain in Example |, simulate the weather 


over 10 days by flipping a coin to determine the chances of sunny or 
cloudy weather the next day according to the Markov chain’s transition 
probabilities. If currently sunny, flip once and a head means sunny the 
next day and a tail means cloudy the next day. If currently cloudy, flip 
twice and when either flip is a head it is sunny the next day and when 
both flips are tails it is cloudy the next day. To start, assume that the 
previous day was sunny. What fraction of the 10 days was sunny? 


- In the weather Markov chain, starting with the probability distribution 


(1, 0) (a sunny day), compute and plot (in p,, p, coordinates) the 
distribution over five successive days. Repeat the process starting with 
the probability distribution (0, 1). Can you guess the value of the stable 
distribution to which your points are converging? 
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3. In the frog Markov chain, what is the probability distribution in the 
next period if the current distribution is 


(a) p; = 1, all other p; = 0? 

(b) p, = .5, p, = .5, all other p; = 0? 

(c) ps = .25, ps = .25, py = .«5, all other p; = 0? 

(d) p, = .1, ps = .2, ps = .2, py = .2, ps = .2,p¢6 = .1? 


4. The printing press in a newspaper has the following pattern of break- 
downs. If it is working today, tomorrow it has 90% chance of working 
(and 10% chance of breaking down). If the press is broken today, it 
has a 60% chance of working tomorrow (and 40% chance by being 
broken again). 

(a) Make a Markov chain for this problem; give the matrix of transition 
probabilities and draw the transition diagram. 

(b) If there is a 50-50 chance of the press working today, what are the 
chances that it is working tomorrow? 

(c) Ifthe press is working today, what are the chances that it is working 
in 2 days’ time? 


5. If the local professional basketball team, the Sneakers, wins today’s 
game, they have a § chance of winning their next game. If they lose 
this game, they have a 4 chance of winning their next game. 

(a) Make a Markov chain for this problem; give the matrix of transition 
probabilities and draw the transition diagram. 

(b) If there is a 50-50 chance of the Sneakers winning today’s game, 
what are the chances that they win their next game? 

(c) If they won today, what are the chances of winning the game after 
the next? 


6. If the stock market went up today, historical data show that it has a 
60% chance of going up tomorrow, a 20% chance of staying the same, 
and a 20% chance of going down. If the market was unchanged today, 
it has a 20% chance of being unchanged tomorrow, a 40% chance of 
going up, and a 40% chance of going down. If the market goes down 
today, it has a 20% chance of going up tomorrow, a 20% chance of 
being unchanged, and a 60% chance of going down. 

(a) Make a Markov chain for this problem; give the matrix of transition 
probabilties and the transition diagram. 

(b) If there is a 30% chance that the market goes up today, a 10% 
chance that it is unchanged, and a 60% chance that it goes down, 
what is the probability distribution for the market tomorrow? 


7. The following model for learning a concept over a set of lessons iden- 
tifies four states of learning: ] = Ignorance, E = Exploratory Thinking, 
S = Superficial Understanding, and M = Mastery. If now in state /, 
after one lesson you have 3 probability of still being in / and 3 proba- 
bility of being in E. If now in state E, you have 4 probability of being 
in /, $ in E, and 3 in S. If now in state S, you have 4 probability of 
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being in E, 3 in S, and 3 in M. If in M, you always stay in M (with 

probability 1). 

(a) Make a Markov chain model of this learning model. 

(b) If you start in state 7, what is your probability distribution after two 
lessons? After three lessons? 


(a) Make a Markov chain model for a rat wandering through the fol- 
lowing maze if at the end of each period, the rat is equally likely 
to leave its current room through any of the doorways. The states 
of the Markov chain are the rooms. 


(b) If the rat starts in room 1, what is the probability that it is in room 
4 two periods later? 


. Make a Markov chain model of a poker game where the states are the 


number of dollars a player has. With probability .3 a player wins | 
dollar in a period, with probability .4 a player loses | dollar, and with 
probability .3 a player stays the same. The game ends if the player 
loses all his or her money or if the player has 6 dollars (when the game 
ends, the Markov chain stays in its current state forever). The Markov 
chain should have seven states, corresponding to the seven different 
amounts of money: 0, 1, 2, 3, 4, 5, or 6 dollars. If you now have $2, 


what is your probability distribution in the next round? In the round 
after that? 


Three tanks A, B, C are engaged in a battle. Tank A, when it fires, hits 
its target with hit probability 5. B hits its target with hit probability 3, 
and C with hit probability @. Initially (in the first period), B and C fire 
at A and A fires at B. Once one tank is hit, the remaining tanks aim at 
each other. The battle ends when there is one or no tank left. Make a 
Markov chain model of this battle. 


Assistance in Computing Probabilities: Let the states of the Markov 
chain be the eight different possible subsets of tanks currently in action: 
ABC, AB, AC, BC, A, B, C, None. When in states A or B or C or 
None, the probability of staying in the current state is 1—this simulates 
the battle being over. One can never get to state AB. (Why?) So one 
only needs to determine the transition probabilities from states ABC, 
AC, and BC. From states AC and BC, the transition probabilities are 
products of the probability that each remaining tank hits or misses its 
target. For example, the probability of going from state AC to state A 
is the product of the probability that A hits C—3—times the probability 
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12. 


13. 


14. 
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that C misses A—?. So this probability is ($)(@) = 7s. It takes some 
knowledge of probability to compute the transition probabilities from 
state ABC. From ABC there is a 7% chance of remaining in state ABC 
(all tanks miss), a 7s chance of going to state AC (A hits B but B and 
C miss A), a 7 chance of going to state BC (at least one of B or C hits 
A and A misses B), and a 7s chance of going to state C (at least one of 
B or C hits A and A hits B). 


Use a computer program to follow the Markov chains in the following 
examples and exercises for 50 periods by iterating the next-period for- 
mula (1) as done in Example 2. 

(a) Example 1, starting in state Sunny. 

(b) Example 1, starting in state Cloudy. 

(c) Example 2, starting in state 4. 

(d) Example 2, starting with p; = p, = .5, other p; = 0. 

(e) Exercise 4, starting in state Broken. 

(f) Exercise 5, starting in state Win. 

(g) Exercise 6, starting in state Market Unchanged. 

(h) Exercise 7, starting in state /. 

(i) Exercise 8, starting in state Room |. 

(j) Exercise 9, starting in state $2. 

(k) Exercise 9, starting in state $3. 

(I) Exercise 9, starting in state $4. 

(m) Exercise 10, starting in state ABC. 


For the rabbit—-fox model in equations (8), use hand calculations to 
verify the population sizes for months 1, 2,.and 3 given in table (9). 
To get the sizes after 1 month, set R = 100, F = 100 (the starting 
sizes) and evaluate the right sides of the equations in (8). Next take the 
values you obtained for R’ and F’ and let these be the new R and F. 
Repeat this process three times. 


For the rabbit—fox model in equations (8), suppose that the initial pop- 

ulation sizes are R = 50, F = 50. 

(a) Calculate by hand the population sizes after | month, after 2 
months, and after 3 months. 

(b) Use a computer or calculator to compute the population sizes over 
8 months. 


Consider the following rabbit—-fox models and an initial population size 
of R = 100, F = 100. In each case, compute the population sizes after 
1 month, after 2 months, and after 3 months. 
(a) R' = R + 3R — .2F (b) R' = R+ 3R — .2F 

Fo =F ~— 2F + IR F =F — .1F + .2R 


Consider the following goat-sheep models, where the two species com- 
pete for common grazing land. In each case, compute the population 
sizes after | month, after 2 months, and after 3 months if the initial 
population is 50 goats and 100 sheep. 
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(a) G = Gt .26 = 35 (b) G’ = G + .2G — .IS 
Ss = 5 + 28 — .2G S’' =S$§ + .48 — .2G 


This exercise concerns the rabbit—-fox model in equations (15). For given 
initial population sizes, calculate the population sizes after 1 month, 
after 2 months, and after 3 months. Also plot the trajectory of population 


sizes from the starting values to the stable sizes (as in Figure 1.8). The 
initial sizes are 


(a) R = 30, F = 24 (b) R = 8, F = 3 (c) R = 8, F = 10 
(d) R = 10, F = 10 
Consider the rabbit—fox model 


R= R+ AR - .1SF 
P= FoF > or 


What is the equation of the line of stable population sizes? For given 
initial population sizes, calculate the population sizes after | month, 
after 2 months, and after 3 months. Also predict the stable sizes to 
which these populations are converging. Compare your numbers with 
the calculations in Exercise 16. The initial sizes are 

(a) R = 30, F = 24 (b) R = 8, F = 3 (c) R = 8, F = 10 


‘ (d) R = 10, F = 10 


18. 
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Consider the rabbit—fox model 


R= K + UR = 32F 
pom ff = AP + QR 


On a graph plot the following: 

(a) The line of stable population sizes. 

(b) The trajectory of population sizes starting from (10, 15). 
(c) The trajectory of population sizes starting from (10, 30). 
(d) The trajectory of population sizes starting from (20, 10). 


Consider the rabbit-fox model 


R' = R + 2R — 3F 
F’ = F — 3F + 2R 


On a graph plot the following: 

(a) The line of stable population sizes. 

(b) The trajectory of population sizes starting from (10, 15). 
(c) The trajectory of population sizes starting from (1, 2). 


(d) How do the trajectories of this model differ from those in Figure 
1.8? 
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Use a computer program to follow the behavior of the nonlinear rab- 
bit-fox model in (19) over a period of 100 months (as in Figure 1.9) 
with the following starting values: 

(a) R=6,F = 6 (b) 100, 100 (c) 10, 10 


In Example 4, if the change in R is .LR — .OIRF in | month, then in 
| day we would expect ¥ of such a change [i.e.. (za0)R — (3000)RF: 
similarly for the change in foxes. Write out the full set of equations for 
this model with time measured in days. Starting with R = 3, F = 3, 
follow the populations as before for 3000 days (= 100 months). (Use 
a computer program.) How do your results compare with those in table 
(20)? 


Projects 


22. 


23. 


Use a computer program to follow the populations for many periods in 
the models in Exercises 14 and 15. Try a couple of different starting 
population sizes. In each case describe in words the long-term trends 
of the populations. 


Make a thorough analysis of long-term trends for the rabbit—fox model 


R = R + bR — eF 
F = F — dF + e'R 


for different values of the positive parameters b, d, e, e’. That is, list 
all possible long-term trends and give conditions on the parameters that 
tell when each trend occurs. For example, one trend is that both pop- 
ulations become extinct, with rabbits dying out first. Determine the 
conditions experimentally by trying many different specific parameter 
values and in each case computing the population sizes over many 
months. 


Linear Programming and 
Models Without Exact Solutions 


In this section we examine two very important variations on the problem of 
solving n linear equations in n unknowns. We illustrate these variations with 
the refinery problem from Section 1.2. 


AG Disk $7 has ee 
Example 1. Refinery Problem Revisited with 
One Refinery Broken 
The original refinery problem had three refineries and three products: 


heating oil, diesel oil, and gasoline. We wanted the production levels, 
X,, X2, X;, of the refineries to meet demands of 500 gallons of heating 
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oil, 850 gallons of diesel oil, and 1000 gallons of gasoline. The equa- 
tions we got were 


Heating oil: 20x, + 4x, + 4x; = 500 
Diesel oil: 10x, + 14x, + 5x, = 850 (1) 
Gasoline: SX, + Sx, + 12x, = 1000 


Suppose that the third refinery breaks down and we have to try to meet 
the demands with two refineries. Our system of equations is now 


Two-Refinery Production Problem 


Heating oil: 20x, + 4x, = 3500 
Diesel oil: 10x, + 14x, = 850 (2) 
Gasoline: Sx, + 5x, = 1000 


A system like (2) with more equations than unknowns is called 
overdetermined and does not normally have a solution. All one can 
ask for is an approximate solution. We want a “‘solution’’ of x,, x, 
values that makes the total production of the two remaining refineries 
as close as possible to the demands. In Chapter 5 we define precisely 


the term ‘‘as close as possible’’ and then show how to solve this 
problem. a 


If we take the situation in system (2) to a greater extreme, with, say, 
10 or 50 equations but just two unknowns, then we have a famous estimation 
problem in statistics. 


Example 2. Predicting Grades in College 


A guidance counselor at Scrooge High School wants to develop a 
simple formula for predicting a Scrooge graduate’s GPA (grade-point 
average) at the local state college as a function of the student’s GPA 
at Scrooge High. The formula would be a linear model of the form 


college GPA = qx (Scrooge GPA) + r (3a) 


OT 


C 


Gar F (3b) 


where C stands for college GPA, S§ stands for Scrooge GPA, and r, g 
are constants to be determined based on the performances of past 
Scrooge graduates. Suppose that the data from eight students are 
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Figure 1.10 Estimated equa- C 
tion for relation between 
Scrooge GPA and college 
GPA. 


A 81 14 18 2 24 26.3 34 3:38 


Predicted C 
Student S§ (Scrooge GPA) C (College GPA) 1.1xS — 9 


A 3.0 nie 2.4 

B 3.6 3.6 3.06 

& 2.6 2.4 1.96 

D 2.2 2.8 2.62 

E 2.0 1.0 Lb @) 
F 3.0 2.8 2.4 

G 3.8 3.0 3.28 

H 3.6 2.8 3.06 


One should pick the constants g and r so that the predicted college 
GPA given by the expression gS + r will be as close as possible to 
the actual college GPA for these students. Using a method discussed 
in Chapter 4, we set g = 1.1 and r = —.9. The predicted college 
GPAs with this formula are given in the last column of the table. Figure 
1.10 has a plot of the C and S values from (4) along with the ers 
line C = 1.18 — .9. 


This is the same sort of problem that we faced in finding an approxi- 
mate solution to the refinery problem in Example |. The statistical name for 
this type of estimation problem is regression. 

Suppose that the counselor wanted to break down the Scrooge grade 
average into various components Gy/s, Ge, and Gy, representing the stu- 
dent’s grades in the three subject areas of math/science, English, and his- 
tory/languages. The counselor would give these three components separate 

_ weightings in a formula like 
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Geo = QqiGujs + G2Ge + Q3Gyy. + 7 (5) 


The solution of regression problems is discussed in Chapters 4 and 5. 
Next let us consider the situation where we have more unknowns than 
equations. Again we use the refinery model. 


3. Refinery Model Revisited 
Without Diesel Oil 
Suppose now that there is no demand for diesel oil and the three 


refineries just produce heating oil and gasoline. So the system of equa- 
tions to be satisfied is 


Se RUS aT 
op nits Se aS 


Two-Product Refinery Problem 


Heating oil: 20x, + 4x, + 4x, = 500 (6) 
Gasoline: 5x, + Sx, + 12x, = 1000 


This system of two equations in three unknowns is called underdeter- 
mined in the sense that there are not enough constraints to determine 
each unknown uniquely. The solution we found in Section |.2 for the 
refinery problem with all three equations is clearly valid with two 
equations: x, = 5, x, = 33, x; = 68. But many other solutions are 
possible. In particular, we could shut down one of the refineries, say 
refinery 3, as in Example 1. In mathematical terms, we seek a solution 
to (6) with x, = 0. Dropping the x, terms from (6), we have 


Heating oil: 20x, + 4x, = 500 (7) 
Gasoline: 5x, + 5x, = 1000 


This system of equations is easily solved by high school algebra: 
Subtract four times the second equation from the first to eliminate x, 
and obtain — 16x, = —3500 or x, = *3g° = 2189. Now the gasoline 
equation becomes 5x, + 5(218%) = 1000; dividing this equation by 
5 and solving for x, yields x, = — 18%. Unfortunately, a negative 
value for x, is nonsense. 


Next try shutting down the first refinery by setting x, = 0. We 
have 


Heating oil: 4x, + 4x, 
Gasoline: 3X» + 12x, 


500 (8) 
1000 


Solving (8) for x, and x, yields the solution x, = °¥° =~ 71, x; = 
32 = 54. We could set x, = O and get another solution. There are 
many more solutions in which all the refineries are running. 

Which solution would we use in practice? The answer is, the 


solution that is most efficient. That is, the solution that is the cheapest. 
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Each refinery will have a cost of operation. Suppose that the costs to 
refine a barrel of oil (the units for the x;’s) are 


Refinery Operation Costs 


Refinery | $30 per barrel 
Refinery 2 $25 per barrel (9) 
Refinery 3 $20 per barrel 


Then we want a solution to the two-product production problem 
(6) (with no x; negative) for which the total refining cost of 30x, + 
25x, + 20x, is minimized. The complete mathematical statement of 
this problem is 


Optimal Refinery Production Problem 


Minimize 30x, + 25x, + 20x, 
subject to the constraints 


20x, + 4x, + 4x, = 500 (10) 
5x, + 5x, + 12x, = 1000 
4.20, SB =o, «=O iB 


The problem of optimizing (minimizing or maximizing) a linear 
expression subject to constraints that are linear equations or inequalities is 
called linear programming. Linear programming is the most important 
mathematical tool in management science. There are thousands of different 
real-world problems that can be posed as linear programming problems. 
When scientists at Bell Laboratories recently proposed a new, more efficient 
way to solve linear programming problems, the announcement was a front- 
page story in major newspapers. 

The optimal refinery production problem (10) involved a set of linear 
equations as constraints together with the inequalities x, = 0, x, = 0, x, = 
0. As we shall see shortly, it is easier to solve linear programs in which all 
the constraints are inequalities. Exercise 13 shows how to convert the equa- 
tions in (10) into linear inequalities. This conversion is discussed further in 
Section 4.6. 

As an example of how linear programs with inequalities are solved, 
we look at a simple two-variable maximization problem. 


a ute a 
Example 4. A Linear Program: Optimal 
Production of Two Crops 


Suppose that a farmer has 200 acres on which he can plant any com- 
bination of two crops, corn and wheat. Corn requires 4 worker-days 
of labor and $20 of capital for each acre planted, while wheat requires 
| worker-day of labor and $10 of capital for each acre planted. Suppose 
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also that corn produces $60 of revenue per acre and wheat produces 
$40 of revenue per acre. If the farmer has $2200 of capital and 320 
worker-days of labor available for the year, what is the most profitable 
planting strategy? 

If 


C = number of acres of corn 
W = number of acres of wheat 


the constraints on land, labor, and capital are given by the following 
system of linear inequalities: 


Land: C+ Ws 200 
Labor: 4C+ W= 320 (11). 
Capital: 20C + 10W = 2200 


also, 
W = 0, Cz 0 


Subject to these constraints, we want to determine C and W so as to 
maximize the total revenue. 


Maximize 60C + 40W (12) 


The expression to be maximized ts called the objective function. 

When only two variables are involved, one can plot the inequality 
constraints and display the region of (x,, x>)-points that simultaneously 
satisfy all the constraints in (11). This region is called the feasible 
region of the linear program, and its points feasible points. See the 
shaded area in Figure 1.11. Recall that to find the points satisfying an 
inequality such as C + W = 200, we plot the line C + W = 200 
and then shade the line and all points on the lower left side of the line. 

Once we have plotted the feasible region for (11), it remains to 
find out which feasible point maximizes 60C + 40W. The following 
geometric insight greatly simplifies the solution of linear programs (this 
theorem is proved in Section 4.6). 


Theorem. A linear objective function assumes its maximum and min- 
imum values on the boundary of the feasible region. In fact, the optimal 
value is achieved at a corner point of this boundary. 


Now we can solve our linear program. The theorem tells us where 
to look for the optimal (C, W)-value—at the corners of the feasible 
region. To find a corner that lies at the intersection of two constraint 
lines, we solve for a (C, W) point that lies on both lines—the same 
old problem of solving two equations in two unknowns. Once we 
determine the coordinates of a corner of the feasible region,’ we can 
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Figure 1.11 Feasible region for 
two-crop linear program. 
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evaluate the expression 60C + 40W at that corner. The corner that 
maximizes this expression is the answer to our optimal production 
problem. In the current problem we can see from Figure 1.11 which 
intersections of constraint lines form corners of the feasible regton. 
Table 1.2 lists the coordinates of the corners and the associated 
objective function values. So the optimal production schedule is to 
plant 20 acres of corn and 180 acres of wheat. In Section 4.6 we 
present a more general, systematic approach to find the maximizing 
corner in a linear program. 


Table 1.2 


Corner Intersecting Objective 
Coordinates Constraints Function 
(0, 0) C =O0OandW=0 0) 

(O, 200) C = 0 and Land 8000 

(20, 180) Land and Capital 8400*** 
(50, 120) Capital and Labor 7800 

(80, 0) Labor and W > 0 4800 


Before leaving this example, we note that the farmer’s constraints 
(Land, Labor, and Capital) determined the feasible region, and the “‘mar- 
ketplace’’ (prices for corn and wheat) determined the objective function. If 
the market prices for corn or wheat change, or equivalently, the farmer 
receives a subsidy for one of the crops, the optimal solution may change. 
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Table 1.3 

Corner Intersecting Objective 
Coordinates Constraints Function 
(0, 0) C=O0OandW=0 0 

(QO, 200) C = 0 and Land 8000 

(20, 180) Land and Capital 9000 

(50, 120) Capital and Labor 9300*** 
(80, 0) Labor and W > 0 7200 


For many years, the Federal Farm Program has offered crop subsidies in 
order to influence both the types of crops grown and the total number of 
acres planted. 

To illustrate how a crop subsidy can cause land to be taken out of 
production, let us suppose in Example 4 that the farmer receives a subsidy 
for corn that increases the revenue from $60 to $90 per acre. Then the 
objective function is now 90C + 40W. The values of this new objective 
function at the corner points of the feasible region are shown in Table 1.3. 
The new optimal strategy is to plant 50 acres of corn and 120 acres of wheat, 


for a total of 170 acres. The subsidy results in the farmer removing 30 acres 
from production. 


Section 1.4 Exercises 


Summary of Exercises 
Exercises 1-5 involve overdetermined systems and regression. Exercises 
6—12 involve linear programming. Exercise 13 tells how to convert a system 


of equations into a system of inequalities (this conversion is discussed further 
in Chapter 4). 


1. Use a trial-and-error approach to estimate as closely as possible an 
approximate solution to the refinery problem in Example 1. 


2. (a) Repeat Exercise I, but now refinery 2 rather than refinery 3 is 
missing. 
(b) Repeat Exercise 1, but now refinery | rather than refinery 3 is 
missing. 
(c) If you had to close down one refinery, which refinery would you 
pick in order to meet the demand as closely as possible with the 
remaining two refineries? 


3. Consider the following system of equations, which might represent sup- 
ply-demand equations for chairs, tables, and sofas from two factories: 
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Factory | Factory 2 Demand 


Chairs: Mx of ja = -200 
Tables: Jen oF =m = 150 
Sofas: 3, + 4% «= 100 


Find an approximate solution to this system of equations by trial and 
error. 


4. For the following sets of x-y points, estimate a line to fit the points as 
closely as possible. 
(a) (1, 1), (2, 3), G, 2), G, 6), (5, 5) 
(b) (1, 6), (2, 4), (3, 3), G, 2), (4, —D, , 0) 


5. Suppose that the estimate for GPA in college in Example 2 had been 
Gc = 6G /s 5 3G, + 3G mat l 


where Gyyys, Gg, and Gy, are the GPAs in mathematics/science, 
English, and humanities/languages. Based on this predictor, on which 
courses should students work hardest (if students want to improve their 
expected college GPA)? 


6. Find a solution to the refinery problem in Example 3 in which the values 
of x, and x, are the same. | 


7. Change the labor constraint in the crop linear program of Example 4 to 
be 4C + 2W = 320. Now what would be the optimal solution? 


8. Suppose that a Ford Motor Company factory requires 7 units of metal, 
20 units of labor, 3 units of paint, and 8 units of plastic to build a car, 
while it requires 10 units of metal, 24 units of labor, 3 units of paint, 
and 4 units of plastic to build a truck. A car sells for $6000 and a truck 
for $8000. The following resources are available: 2000 units of metal, 
5000 units of labor, 1000 units of paint, and 1500 units of plastic. 

(a) State the problem of maximizing the value of the vehicles produced 
with these resources as a linear program. 

(b) Plot the feasible region of this linear program. 

(c) Solve this linear program by the method in Example 4, by deter- 
mining the coordinates of the corners of the feasible region and 
finding which corner maximizes the objective function. 


Hint: By looking at the objective function, you should be able to 
tell which corners are good candidates for the maximum. 


9. Suppose that a meal must contain at /east 500 units of vitamin A, 1000 
units of vitamin C, 200 units of iron, and 50 units of protein. A dietician 
has the following two foods from which to choose: 
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Meat: One unit of meat has 20 units of vitamin A, 30 units of vitamin 
C, 10 units of iron, and 15 units of protein. 

Fruit: One unit of fruit has 50 units of vitamin A, 100 units of vitamin 
C, | unit of iron, and 2 units of protein. 

Meat costs 50 cents a unit and fruit costs 40 cents a unit. 


(a) State the problem of minimizing the cost of a meal that meets all 
the minimum nutritional requirements as a linear program (now 
you want to minimize the objective function). 

(b) Plot the boundary of the feasible region for this linear program. 

(c) Solve this linear program by the method in Example 4 [see Exercise 
8, part (c)]. 


Consider the two-refinery problem in Example 1. 


Heating oil: 20x, + 4x, = 500 
Diesel oil: 10x, + 14x, 850 (2) 
Gasoline: 5x, + Sx, = 1000 


Suppose that it costs $30 to refine a barrel in refinery | and $25 a barrel 
in refinery 2. What is the production schedule (i.e., values of x,, x>) 
that minimize the cost while producing at least the amounts demanded 
of each product (i.e., at least 500 gallons of heating oil, etc.)? 


Hint: Solve by the method in Exercise 8, part (c). 


Consider the following two linear programs. 
(i) Maximize 3x, + 3x, (ii) Minimize 10x, + 8x, 
subject to subject to 
gE, 20. x0 %, = 0, 2 =U 
x, + 2x, = 10 xX, + 2X, = 3 
DE AN KSB | ay, + Xe BzS 


Solve them and show that the optimum values of these two objective 
functions are the same. 


Set up the following problem as a linear program, but do not solve. 
There are two truck warehouses and two stores that sell trucks. The 
following table gives the cost of transporting a truck from one of the 
warehouses to one of the stores. 


Store 1 Store 2 


Warehouse 1 $40 $50 
Warehouse 2 $60 $40 
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Warehouse | has 100 trucks and warehouse 2 has 80. Store | needs at 
least 50 trucks and store 2 needs at least 100 trucks. Find the cheapest 
way to meet the stores’ demand. 


Hint: Let the variables be x,;, X,2, X2;, and x, where x, is the amount 
shipped from warehouse i to store /. 


13. To convert a system of equations in which each variable must be = 0, 
into a system of inequalities with each variable = 0, we perform the 
following steps. 


(i) 


(ii) 


Pick a variable in the first equation and solve that equation for the 
chosen variable, that is, so that the chosen variable is alone on one 
side of the equation. For example, in the system of equations 


2x + 4y + 6z = 8 
x + 3y + 2z = 6 
<2 057 = 0,220 


if we pick x in the first equation, then we rewrite it as 

ay t= Zy = 38: t 4 
Replace the chosen variable in the other equations by substituting 
in its place the right-hand side in (*). So the second equation in 


this problem becomes 


(=2Zy = 32 + #) be Sy + 22 


I 
a 


OT 


y-z=2 


(iii) Since the chosen variable is = 0, the right side of (*) must be 


= Q, that is, 
Bike on Bee A 
or 
4=>2y + 32 (equivalently, 2y + 3z = 4) 


Now the original three-variable problem has been reduced to a two- 
variable problem with one equation converted into an inequality. 


2y + 3z= 4 
yo Z2=2 
y¥=U; z= 
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(iv) Repeat the entire procedure for a variable in the second equation. 
If we pick y, solve the second equation for y in terms of z, substitute 
this expression involving z in place of y in the first nequahty; also 
make this expression in z be = 0. 


(a) Complete step iv. 

(b) Convert the refinery linear program at the end of Example 3 into a 
linear program with inequalities, and solve the linear program. 

(c) Convert the following system of equations for nonnegative vari- 
ables into a system of inequalities. 


xy =0, x0, xz ed 


# 7.9 Arrays of Data and 
Linear Filtering 


In the previous sections we have encountered arrays of numbers that were 
the coefficients of systems of equations. But not all arrays of numbers are 
sets of coefficients. There are many problems in which arrays of numbers 
are input data to be analyzed. In statistics, we study huge data sets that come 
in sequences, two-dimensional arrays, and more complex structures. In the 
field of information processing and pattern recognition, certain information 
must be extracted from the data, be it a coded message or a picture. Both 
of these fields make heavy use of linear models to process arrays of data. 

The examples in this section illustrate the use of linear models in 
pattern recognition and encoding of information. First we consider an ex- 
ample in which the data to be processed are letters, not numbers. 


Example 1. Linear Models for Encoding 
Alphabetic Messages 


A common approach to coding an alphabetic message is to treat each 
letter in the message as a number between | and 26: A— 1, B > 2, 
C— 3, ..., Z— 26. The simplest way to encode a message is to 
convert each letter (number) to a different letter (number) using some 
simple arithmetic formula. For example, we could add 7 to each num- 
ber (this shifts the corresponding letter seven places to the nght in the 
alphabet) or multiply each number by 11. However, these operations 
will sometimes convert a number between 1 and 26 to a number greater 
than 26. 

To ensure that the result of some calculation 1s a number between 
1 and 26, we usually assume that all arithmetic is done mod 26. As 
an example, 
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7 X 13 = 91 = 13 (mod 26) since 9] = 3 X 26 + 13 


Encoding schemes based on adding a constant to each number 
(letter) or multiplying by a constant are easy for an outsider to break 
because once one letter is guessed, the constant can be determined. 
For example, if X, is the original letter and X, the coded letter, the 


encoding 
X- = 1X, (1) 
transforms THE into JDI since in numbers T = 20, H = 8, and 
E = 5, so 
7°-T = 7+ 20 = 140 = (mod 26) 10 = J 
7°-H=7-*8 = 56 = (mod 26) 4=D 
T°-E=7-5 = 35= (mod 26) 9=] 


Now if we guess that I is the encoding of E, it is easy to compute that 
the constant in (1) is 7. Even a code with multiplication and addition, 
such as X. = 9X, + 21, is easy to break. 

A better scheme, in which frequent words and letters are scram- 
bled, is to encode numbers in pairs using two linear equations of the 
following form. Let L,, L, be a pair of original letters (represented as 
numbers between 1 and 26), and C,, C, be the coded letters (also 
represented as numbers) into which L,, L, are transformed. 


C, =aL, + bL, (mod 26) (2) 
C, =cL, + dL, (mod 26) 


For example, in the scheme 


C,=7L, + 2L, (mod 26) , 


the pair E, C, represented numerically as 5, 3, would be encoded as 


C,=9 x5 + 1/ xX 3 
C3, =7X 5+ 2% 3 


96 = 18 (mod 26) =R (4) 
41 = 15 (mod 26) = O 


| 


To use (2) for a whole message, we divide the string of m letters 
(ignoring punctuation) into m/2 successive pairs. Observe that if the 
fifth letter in the message were an E, the fifth letter in the encoded 
sequence would vary depending on what the sixth letter was [with 
which E is paired in (2)]. There are four constants [a, b, c, d in (2)] 
in this encoding scheme, and hence 26* = 456,976 different possible 
schemes. Moreover, there are no meaningful patterns of frequently 
used letters to help a codebreaker. 
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If the codebreaker had access to a large computer, we could 
counter this by grouping letters in sets of 5 and replace (2) by a scheme 
involving five equations of linear combinations of five letters. Now 
there would be 25 constants yielding 267? = 10°° schemes—and we 
can go to sleep knowing that our code is secure. 

Although it is important to keep this code from being broken, it 
is also important that a code not be too hard to decode by a receiver. 
If the receiver knows the constants in (2), he or she still has to reverse 
the encoding process by solving a pair of equations in two unknowns. 
For example, if (3) were being used and the pair R, O (= 18, 15) 
generated in (4) were received, the decoder would have to solve the 
system of equations 


9L, + 17L, 


18 (mod 26) (5) 
15 (mod 26) 


For a more complex scheme of five equations in five unknowns, 
the decoding problem gets harder, especially since arithmetic is mod 
26. Fortunately, we shall show in Chapter 3 that there exist simple 
formulas for decoding so that the original pair L,, L, (or 5-tuple) can 
be computed as a linear combination of the coded pair C,, C,. For 
example, the decoding equations for (3) are 


L 
L, 


18C, + 3C, (mod 26) (6) 
1I5SC, + 3C, (mod 26) a 


The next two examples involve data analysis. 


Example The Mean of a Data Set 


The most basic piece of statistical information about a set of data is 
the mean, or average, of the data. The mean is obtained by summing 
all data values and dividing by the number of data. For example, 
the mean of the sequence of numbers 2, 3, 13, 3, 7, 1, 9, 3, 4, 5 is 
2+3+134+34+74+14+9+34+4+4 5)/10 = S. 

The mean is a linear combination of these 10 data values in which 
each value is multiplied by 7o (recall from Section 1.1 that a linear 
combination is an expression of the form c,x, + c.x, + +++ + c,.X,). 

a 


Example 3. Smoothing a Time Series 


In many situations one receives a sequence of numbers recorded over 
time that form a pattern, but randomness in nature or in recording and 
transmitting the numbers has obscured the pattern. Such sequences are 
called time series. Consider the series of readings in Data Plot 1 taken 
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Time Data 


l 23 
2 27 
3 21 
4 32 
5 29 
oe 26 
7 30 
8 29 
9 26 
10 26 
1] 27 
12 19 
13 24 
14 22 
15 25 
16 20 
17 16 
18 27 
19 19 
20 15 
21 13 
22 25 
23 18 
24 22 
2) 21 
26 24 
2/ 25 
28 28 
29 23 
30 27 
Data Plot 1 


over a period of time. Suppose that this time series gives the numbers 
of people applying for welfare aid in some city in successive months 
(the numbers are presented in units of 100). The values might equally 
well have represented levels of X-rays measured in a spacecraft or the 
numbers of new houses started in the United States in successive 
months. 

To help picture the data, we plot the numbers in a graph, with 
time measured on the vertical axis and the data values on the horizontal 
axis. (The axes are omitted in the graph.) 

We want to try to find a long-term pattern in this time series by 
smoothing the data—that is, reducing the jumps in data from one 
period to the next. In engineering, the task of smoothing a noisy elec- 
tronic signal is called filtering (the term is now also used in nonen- 
gineering settings). 
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The simplest way to filter a time series is by replacing the ith 
value d; by the average of d; and the two adjacent values d;_, and 


d;,,. That is, we form a new time series whose ith value d' is given 
by 


ion 
d' a d;_, : + d; . (7) 


For example, we replace d,; by 


pp. = dia t dys + dig _ 22 + 25 + 20 
SS SS SS ee oe 


3 3 22 


(When the value of d; is fractional, we shall round to the nearest 
integer.) The formula in (7) is not defined for the first and last values 
(i = 1 and i = 27); instead, let us set d}; = (d, + d,)/2 and 
d3, = (dy, + d,7)/2. The complete system of linear equations (the 
linear model) for filtering is then 


d|' =%d, + 4d, 
d=td, +4d; + 4d, 
d,= 3d, +d, + 3d, (8) 


d,— is 3d,,_ > ar 3d,,_ * 3d,, 
d' = td,_, + td 


Our new time series looks as shown in Data Plot 2. This time series 
is much smoother. There is a clear trend of increasing values, then 
decreasing, then increasing, and finishing relatively level. 

To smooth these data further, we could apply the smoothing 
transformation (8) again to this new time series. Instead, let us smooth 
the original data (in Data Plot 1) by applying a weighted average of 
five values in which d; is weighted more and d;_, and d;, , are weighted 
less: 


9 + 2a. + 3d, & 2éisg * dd: 
d" ne d;_> i—] % i+ 1 i+2 (9) 


For example, now d,, is replaced by 


9 


2442 22+3%x25+2%x 20+ 16 199 
J y 


ie 
dis = 


= 22 
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Time Data 
l 25 
2 24 
3 2] 
4 27 
5 29 
6 28 
7 28 
8 28 
9 27 

10 26 

11 24 

12 23 

13 22 

14 24 

15 22 

16 20 

17 21 

18 21 

19 20 

20 16 

21 18 

22 19 

23 22 

24 20 

25 22 

26 23 

27 26 

28 25 

29 26 y 

30 25 

Data Plot 2 


For dj, drop the missing terms from (8) to obtain dj; = (3d, + 
2d, + d;)/6, and similarly for d3, d3,, d3,. The new time series ob- 
tained when transformation (8) is applied to the original data is as 
shown in Data Plot 3. Observe how smooth this transformed time series 
is as compared to the original one. The Exercises have other examples 
of time series for which filtering reveals important trends. 

We can show that applying the three-value average (7) to Data 
Plot 2; or, equivalently, applying the three value average twice to the 
original data, produces the same time series as in Data Plot 3. In 
general, successively performing two (or more) linear filterings is 
equivalent to performing another (more complicated) filtering; see the 
Exercises for examples. a 
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Time Data 
] 24 
2 25 
3 26 
4 28 
5 28 
6 29 
7 28 
8 28 
9 2] 

10 26 

1] 25 

12 23 

13 23 

14 23 

15 22 

16 21 

17 21 

18 21 

19 19 

20 18 

21 17 

DL 19 

23 20 

24 21 

25 22 

26 24 

27 25 

28 26 

29 26 

30 26 

Data Plot 3 


Example 4. Linear Filtering in 
Pattern Recognition 


When the TV camera that serves as the eyes of a robot transmits a 
picture to the robot’s computer, the picture is sent as a two-dimensional 
array of numbers that indicate the darkness of each point in the picture. 
A computer program to perform pattern recognition must determine 
what the robot is seeing by analyzing this digital representation of the 
picture. Light reflecting off an object or confusing patterns in the back- 
ground can make a simple object quite difficult to recognize. Trans- 
formations to filter the data and increase the level of contrast are an 
essential part of any pattern recognition program. 
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Suppose that there are nine shades of darkness of a point, rep- 
resented by integers 0 through 8, 0 for white (least dark) and 8 for 
black: 


O= , I=), 2=8, 3=8, 4=. 
S=M, 6=-@. 7=-@. 38-8 


A picture represented by the 12 x 12 array of darkness values shown 
in Data Plot 4 has been received from a robot’s TV camera (of course, 
a 12 xX 12 array would only be a small section of the full TV image). 


»* * renee rrr ve 
“rf, wedewees “3 


HERY 5 


: ‘ BSS é sereeeee 
Lt ght as seeree 
ue Hee saree erases 
ee cghitece eet oh fi: 
| he nase ;. 
‘ + oe tewese 


C3 a 
lan Ea oo Be eee eee eee 2 
Paracas a a 
ha “s 
ais _— oo = 
Retece 
re ‘se os 


Bee sates 

" . Lae eae ies 
ou: Bets oosees 

Saket 


Data Plot 4 


Let us perform a linear filtering on this array of darkness values. 
We replace each value by a weighted average of the value and the 
eight values surrounding it, as denoted by I|’s and 4’s in Figure 1.12. 


141 

44 

141 15 
7 
15 


We use a weighting in which the old value gets a weight of 16, the 
four neighboring values in the same row or column (denoted by 4’s in 
Figure 1.12) get a weight of 4, and the diagonal neighboring values 
(denoted by 1’s in Figure 1.12) get a weight of 1; we divide this 
weighted sum by 36. (At the edges of the array, we increase the 
weights on the border values to 5, 20, 5, as shown in Figure 1.12.) 
The transformed array is shown in Data Plot 5. 
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,* 
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Data Plot 5 


We might now start to perceive a person in the figure. But to see 
the person clearly we need greater constrast. To increase the contrast 
between light and dark, we apply the following linear function with 
roundoff: f(x) = 3x — 9; a value below 0 is rounded to 0 and a value 
above 8 is rounded to 8. The table for this contrast function is 


Old Value Woe Sle. a ee. Se “GF. oF 


New Value C=. OF .afi-to (6 <B .8/ 8 


With this contrast function, Data Plot 5 becomes Data Plot 6. 
Now the person is fairly visible, perhaps with an object by the left 
foot. Greater contrast would help a little more. A good computer pro- 
gram to recognize patterns should now be able to “‘see’’ that the object 
pictured is a human being. 


ochre 
ee etets 
ee 
i= 

siete _ con 

=f; = 7 el: gd. 7 
rts : 
. ea: 


Data Plot 6 
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‘*Adaptive’’ filtering schemes do a small amount of filtering and 
then look for ‘‘borders’’ between light and dark regions. The region 
around a border is then subjected to a contrast transformation to ac- 
centuate the border, while nonborder regions are filtered as above. & 


Section 1.5 Exercises 


Summary of Exercises 

Exercises 1-5 are associated with Example | about codes. Exercises 6-12 
are associated with Example 3 on filtering time series; Exercises 10—12 
involve algebraic composition of transformations—they require more ma- 
turity. Exercises 13-15 are associated with Example 4 about two-dimen- 
sional arrays forming “‘pictures.”’ 


1. 


Evaluate the following expressions mod 26. All answers must be posi- 
tive numbers between | and 26. 

(a) 7 X 7 (b) 12 x 5 (cy. = 3. 5 (d) -—11 x 19 
(e) 12 x (14 + 17) 


. Use the encoding C = 5L + 7 (mod 26) to encode the letters in the 


following words. 
(a) BE (b) AT (c) APE 


. Use the encoding in equations (3) to encode the following pairs of 


letters. 
(a) BE (b) AT (c) CC 


. Use the decoding in equations (6) to decode the following pairs of 


letters. 
(a) BG (b) COC. {ec} RD 


. Determine the value(s) of x that satisfies (satisfy) the following equa- 


tions mod 26. Which equations have unique solutions? 
(a) 9x = 11 (b) 7x = 13 (c) 14x = 3 (d) llx=9 


. Apply the filtering transformation in formula (7) to the first 15 numbers 


in the time series in Data Plot 2. (You should get the same results as 
in Data Plot 3.) 


. Apply the following filtering transformations to Data Plot | (explain 


how you alter these transformations for the first and last values). 


d;_> + d; rv d; 2 


(a) d; = 3 
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Gi-2 + divs + di + Ginn + dine 
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d\_,+d_, + dj, + dix; 
4 


(b) d; = 
(c) d; = 


Consider the time series 2, 10, 4, 12, 6, 14, 8, 16, 10, 18, 12, 20. 
Apply transformations (a), (b), and (c) from Exercise 7. Which of the 
transformations smooth this time series well, and which do a poor job? 


. Consider the time series 1, 4, 2, 5, 8, 6, 3, 10, 3, 12, 10, 9, 8, 12, 


18, 13, 21, 16, 16. Apply transformations (a), (b), and (c) from Ex- 
ercise 7. Which of the transformations smooth this time series well, and 
which do a poor job? 


Show algebraically that if the transformation in formula (7) were applied 
to a time series and applied again to the resulting time series, then the 
cumulative result would be the same as the transformation in formula 


(9). 


(a) Suppose that the transformation in Exercise 7, part (a) is applied 
twice (as described in Exercise 10). Give a formula for the cu- 
mulative transformation. 

(b) Suppose that the transformation in Exercise 7, part (a) is applied 
to a time series and then the resulting time series is filtered by the 
transformation in Exercise 7, part (b). Give a formula for the cu- 
mulative transformation. 


Suppose that d; = a,d;_, + aod, + ajd;,,, d; = b,d;_, + bd, + 
b,d,, ,. Give a formula for the transformation obtained by performing 
the first and then the second transformation on a time series. 


Apply the contrast function (given just before Data Plot 6) to Data 
Plot 6. 


Apply the following filtering transformations to the upper 8-by-8 corner 
of Data Plot 4 (just the first 8 rows and first 8 columns), and then apply 
the contrast function given in the text. The transformations are described 


by a 3-by-3 square of weights, as in Figure 1.12. Explain what you did 
at the borders. 


wat tr (b) 0 
a l 
Mov dy, 3 0 


How effectively does each transformation help reveal the human being 
in the picture? 
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15. Consider the following “‘pictures,’’ given in terms of numbers rather 
than darkness levels. Apply the transformation in Example 4 followed 
by the contrast function. Give your answer in darkness levels. What is 
the letter or number in each picture? 


fe) i a a (ee bo 7 25-8 Cc) So Do ez 
ee ae Se ee y & 2s 7 4dgs 3 8 
P 2 alee a 2S t-i6 2 & © 1 7 
Ya Sale aM ses oo eae F 2.6. 3 33 
i. & OF SS ' eee FS o ® £6 6 
Ss 4 DS & Ss 6° 3, 2°53 5 > & 2 4.79 
7 8 4 9 ? & 4 6 7 So ke OO 
> 7 & 8 6 a ek Se oy mee ey Ss 


Matrices 


© Section 2.1 Examples of Matrices 


An essential tool in all mathematical modeling is good notation. This is 
especially true for models with large systems of equations or arrays of data. 
Two characteristics of good notation are 


1. To provide a way to express complex operations simply. 
2. To help a reader concentrate on the central features of a model without 
being overwhelmed by numbers. | 


Most data can be naturally organized into tables. Sometimes the table 
consists of a single list, as in a list of scores of students on a test. Sometimes 
the table has the form of a rectangular array with several columns and rows, 
as in a teacher’s record of the scores of students on all tests in a course; 
here we have one column for each test and one row for cach student. The 
mathematical name for a rectangular array of numbers is a matrix. The most 
common type of matrix in mathematical applications is the array of the 
coefficients in some system of linear equations. 

In this section we introduce basic matrix notation. Matrix notation 
takes most people a little time to learn. But in a short while the reader will 
find it impossible to talk about linear models without using matrix notation. 

A matrix is a rectangular array of numbers. We speak of an m-by-n 
matrix when the matrix has m rows and n columns, and we use capital 
' boldface letters, such as A, to denote matrices. (The common handwritten 
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way to indicate a matrix is with a wavy line under the letter, such as A.) 
We use the notation a; to denote the number in matrix A occurring in row 


i and column j. This is similar to the computer programming notation A(I,J). 
Examples of matrices are 


4. 3-2 
Pa See id Melee 4 (1) 
ah 36:6 = - 

5 ear fk 


An ordered list of m numbers is called a vector or an n-vector. We use 
lowercase boldface letters, such as v, to denote vectors; v, is our name for 
the ith entry in vector v. Examples of vectors are 


7 
v = [I, 2, 3, 4] and c=1]8 
9 


Sometimes we write a vector as a row of numbers, sometimes as a column, 
but a vector is formally just an ordered list. 

An n-vector is just a l-by-m matrix or an n-by-! matrix. Conversely, 
an m-by-n matrix A can be thought of as a set of m row vectors (each of 


length n) or as a set of n column vectors (each of length m). We use the 
following notation: 


a* denotes the ith row vector in A. 
a‘ denotes the jth column vector in A. 


We omit the R (or C) superscript when it is clear from the discussion that 
we are talking about rows (or columns). 


For example, in the matrix A in (1), 
2 
a, = 2, a, = 5, at = [l, 2, 3], aS = | 


Summarizing our matrix notation, we can write a general matrix A in 
the following ways: 


R 
Gy, Ay2 Ay3 ay; Gin a) 
R 
G5, U7 Ay a>; Ady ay 
R 
43, 432 433 a3; A3y, a3 
A= = 
a a a a a a® 
it i2 i3 ij in i 
R 
Gyn\ Gin a m 


m3 we Ay; oa Ginn a 
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Or 


_ & c Srey a Cc 
a [a‘ a a; aj a), 
The following examples will show how vectors and matrices arise 


naturally in linear models introduced in Chapter 1. 


Example 1. Matrix Notation for Oil 
Refinery Model 
In Section 1.2 we introduced a system of linear equations modeling 


the production of three products—heating oil, diesel oil, gasoline—by 
three refineries. The system of equations was 


Heating oil: 20x, + 4x, + 4x, = 500 
Diesel oil: 10x, + 14x, + 5x, = 850 (2) 
Gasoline: SX, + Sx» + 12x, = 1000 


We can make a matrix A of the coefficients on the left sides 


in (2). 
a. Uk‘ CR 
A=1]10 14 §5 (3) 
a a we 


Each column of A is a vector of outputs by a refinery. For example, 
from | barrel of oil, refinery 2 produces an output vector 


Each row on the left side is a vector of amounts produced of some 
product. The vector for gasoline is af = [5, 5, 12]. The right-side 
numbers in (2) form a demand vector. % 


Example 2. Matrix Notation for Leontief 
Economic Model 


The Leontief model for economic equilibrium in Example 2 of Section 
1.2 contained the following system of supply—demand equations: 
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Consumer 
Supply Industrial Demand Demand 
Energy: x, = .4x, + .2x, + .2x, + .2x, + 100 
Construct.; %, = ..3%, + .3% + .2x;. + .lx, + 50 (4) 
Transport.: x, = .lx, + .Ly, + + eee * 100 
Steel: x, = + Be oes + 0 


It is natural to form a matrix D of the coefficients of industrial demands 
on the right-hand side of (4). The set of consumer demands in the last 
column in (4) form a vector c. 


a ak: ee 100 
ae: San ae, 50 
ae ek ois MR chal! 7 ©) 
ar tie ab 0 


Recall that the second row d& = [.3, .3, .2, .1] tells how much of 
product 2 (construction) is needed to produce | dollar’s worth of the 
other products; for example, it takes d,, = .2 dollar of product 2 to 
make | dollar of product 3 (transportation). Similarly, the third column 


¥ 
y 
dS = 
: 0 
J 
tells the inputs required to make | dollar of product 3. a 


Example Matrix Notation for a Markov Chain 


The transition probabilities of the Markov chain in Example 2 of Sec- 
tion 1.3 (about a frog wandering around in a highway) form a transition 
matrix A: 


OO Ye 


The columns are associated with current states and the rows with states | 
in the next period. Entry a, is the transition probability of going from — 
state j to state i—the probability that if the frog is now in state j, then 
| minute later it will be in state i. The third column af gives the 
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probability distribution among states next period if we are currently in 
state 3. 

In general, we have a vector p of the current probability distri- 
bution p = [P,, Po, P3, Pa, Ps, Pol; that 1s, p, is the probability that 
the frog is currently in state 7. From p and the transition probabilities 
in A, we obtained a system of equations that allowed us to compute 
the probability distribution p’ for the next period [see equations (4) of 
Section |.3]. By repeating the process of computing the next-period 
probabilities k times, we could compute the probability distribution p“? 
after k periods, (see Table 1.1). mi 


Example 4. A Vector as a Point in Space 


A common use of vector notation is to represent points in space. In 
two-dimensional space, we use a 2-vector; in n-dimensional space, we 
use an n-vector. The point in three-dimensional x-y-z space with co- 
ordinates x = 2, y = 7, z = 1 Js written as the vector [2, 7, 1]. Often 
a two- or three-dimensional vector is represented by an arrow going 
from the origin to the point with these coordinates, as shown in Figure 
ay 

The collection of all 3-vectors is all of three-dimensional space; 
all n-vectors are n-dimensional space. Much of the algebraic theory 
about collections of vectors is equivalent to the geometric theory of 
the corresponding spaces of points. In Section 5.2 we discuss some 
properties of the collection of vectors that satisfy a given system of 
linear equations. 

It is often helpful to think of an n-vector as a point in n-space. 
For example, we can talk naturally about the distance between two 
vectors. We can use x-y coordinates to plot the behavior of a linear 
model involving two-dimensional vectors. a 


The thoughtful reader may rightly ask: ‘‘You have shown me one- 
dimensional arrays and two-dimensional arrays, so what about higher-di- 
mensional arrays?’’ We see higher-dimensional arrays from time to time in 
computer programs. If programmers work with three-dimensional arrays, do 
not mathematicians? The answer surprisingly is basically ‘‘no’’ (although 


Figure 2,1 Vector as an arrow. 
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tensors are a higher-dimensional extension of matrices). Historically, matri- 
ces have been closely associated with systems of linear equations, as in 
Examples | and 2. The operations performed on matrices are defined with 
an eye on the associated systems of equations. The absence of a natural 
higher-dimensional version of a system of linear equations is the major 
reason why mathematicians have only been concerned with one- and two- 
dimensional arrays. 

Matrix elements need not be numbers, as this next example illustrates. 


P - 
: 
2 PELs eS ie 


_ Encoding Messages with Matrices 


In Example | of Section 1.5 we introduced some linear models for . 
encoding messages by converting letters to numbers between | and 26. 
In this example we show how to scramble a message without trans- 
forming the letters. We place the message into a matrix and perform 
simple scrambling functions on this matrix. 

Suppose that our message 1s 


— 
ais 2 


ae ys 


Examp 


ALLIED SOLDIERS SHOULD REMAIN ON ALERT 


The message has 33 letters. We use this list of the letters (ignoring 
spaces) to fill the entries in the first row, then the second row, and so 
on in a matrix M. We want M to be a square matrix. ‘To accommodate 
33 letters, we need a 6-by-6 matrix (with 36 entries). We add three 
E’s (or any nonsense lettefs) at the end of the message to fill out the 
matrix. 


(7) 


m-rnn> 
ZBzowor 
Honwurnr 
mgm of 
m>zo-m 
mer>remo 


Consider the following simple operations on a square matrix. 


-— 


. Interchange two rows (or two columns). 

2. Interchange ith row and ith column. 

3. Rearrange entries in a row (or column), such as reversing the order 
of entries or cyclicly permuling. 


A suitable sequence of 10 operations, chosen from these three 
types, will produce an array of letters that will be impossible to un- 
scramble without knowing what operations were performed. As an 
example, suppose that we interchange the first row and column; then 
interchange the new third row and column; and then interchange the 
new sixth row and column. The result is 
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Ao ake er hy. MD 
Ce? Se ob R 
KR.’ & oe 
M* = 
rp Bre Me 8) 
BW ON A. B 
A ee ee es Emel 
Already, the message is unintelligible. ie 


Now we define some simple operations on matrices. The most basic 
operation is to multiply a vector or matrix by a constant c. This operation 
is called scalar multiplication. A scalar is a single number, as opposed to 
a vector or matrix. Scalar multiplication is performed by multiplying each 
entry in the vector or matrix by the constant c. For example, 


° ae: > & Hae 73 
M=i13s 9 £ 31; then3M =|9 27 6 I5 
i © -B “Z a AS 1s” EH 


Addition of vectors and matrices is straightforward—add the corre- 
sponding entries together. There is one minor problem, however. Two vec- 
tors being added together must have the same length, and two matrices being 
added must have the same number of rows and same number of columns. 
For example, if | 


t ‘%§ 4 4 6 
A= are and B=)0 41, thenA + B= pS, 
—7 QO i 2 —6§ 2 


Example 6. Matrices of Test Scores 


Suppose that we are recording the test scores of four students in three 
subjects. To preserve confidentiality, we will call the students A, B, 
C, and D, and the subjects |, 2, and 3. The students have two hour 
exams and a final exam in each course, each graded out of 10 points. 
For each of the three tests we form a matrix of test scores with rows 
for students and columns for subjects. Call the matrices §,, S,, and S, 
(S, is the matrix of final exam scores). 


5 9 5 6 7 9 
ee RS ee 
DP IST et eg 8 
5 6 7 6 5 6 
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Then the matrix T of total scores of each student in each course (with- 
out any weighting to make the final more important) is 


T= §$.+ 8. +8, 


Summing the corresponding entries in S,, S,, and 83, we obtain the 
matrix T: 


6 8 9 s-.9: § at 
ee eo Pe 
a a 7 8 8 a 2 
4 6 6 coe F i SS 
(10) 
17 24 26 
122 18-26 
Bae pee ee Nia 
i 17 19 


Suppose that the final should be weighted twice as much as each hour 
test. Each test had a total of 10 points, and we want the course score 
also to be out of 10 points. That is, the course score is a weighted 
average of the tests. Then the matrix C of weighted averages of course 
scores has the form 


C = }s, + 4s, + 4s, (11) 


We compute C by computing the linear combination in (11) for each 


entry. For example, the entry c,,, student A’s weighted average in 
course 2, IS 


A computer program to compute all the c,;, entries would look as fol- 
lows: 


FOR I = 1 TO4 
FOR J = 1 TO3 
Cd, J) = .25*S1(1, J) + .25*S2(1, J) + .5*S3(1, J) 
NEXT J 
NEXT I 


Using this program, we obtain C (fractions = .5 have been rounded 
up; that is, 3.6 is written as 4): 
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ae ee 
Aj6é 8 9 
C= B)8 6 9 (12) 
Cra.) J8 
Di>, 6 6 


Example 6 implicitly shows why we rarely have to check whether a 
set of matrices that we want to add together has the same numbers of rows 
and columns. We would not want to add the matrices together unless the 
entries in the matrices matched up in some natural way. 


Section 2.1 Exercises 


Summary of Exercises 
The exercises in this section are straightforward variants on the examples in 
this section. 3 


1. Given the matrix 
TS es tee | 
A=szi2 4 6 8 
4°) Che 


write out the following row and column vectors, and entries. 
(a) af (b) aS (CD aS se J ay— (@) Ay, 


2. In the matrix of letters 


B Re Sf A 
ee N PF @ &o wy 
| nm. BO bk 
MG. ¥ FP UK 
spell out the words given by the following sequence of entries. 
(A) 43, Ay), 435, Ar (D) G35, 434, G2), 4y,, Ays, G2 
(C) @)), 435, 432, G23, As (d) 435, y5, Qy4, 424, A345 Az3, G33, Ay4 


_ 


Consider the following Markov chain model involving the states of mind 
of Professor Mindthumper. The states are Alert (A), Hazy (H), and 
Stupor (S$). If in state A or H today, then tomorrow the professor has a 
3 chance of being in each of the three states. If in state S today, to- 
morrow with probability | the professor is still in state S. 

(a) Write the transition matrix A for this Markov chain. 

(b) Write out entry a,, and column aS. 

(c) Which pairs of rows and pairs of columns in this Markov chain are 

the same? 
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. In the transition matrix A for the frog Markov chain in Example 3, 


what does entry a,, represent? 


. (a) In the matrix A for the refinery model in Example 1, state in words 


what the numbers in entries a,, and entries a,, represent. What do 
the numbers in the third column of A represent? 

(b) Suppose that refinery 3 is modernized and its output for each barrel 
of oil is doubled. What is the new matrix of coefficients? 

(c) In Example | of Section 1.4 we discussed the situation where re- 
finery 3 broke down and was out of service. In this case, what is 
the matrix of coefficients? 


. Make a matrix for data of Scrooge high school GPAs and college GPAs 


for the set of students in Example 2 of Section 1.4. 


. Write out the matrix of coefficients in the inequality constraints of the 


linear program in Example 4 of Section 1.4. 


Plot the following vectors as points in the x-y plane. 
(a) (1,0) (b (2,4) © 2, -T 


. Plot the following points on an x-y-z grid of the sort given in Exam- 


ple 4. 
(a) [1, 0, O] (b) [1, 1, 1] (c) (2, 4, 1] (d) (2, —1, 3] 


Scramble the matrix M in Example 5 by performing the following 

sequences of changes. 

(a) Interchange row 2 and column 2; interchange row 3 and 5; inter- 
change columns | and 4. 

(b) Reverse the order of the letters in row 2; do the same in column 
3; interchange row | and column 1; do the same for row 4 and 
column 4. 


(c) Reverse the order of each row; then reverse the order of each col- 
umn. 


In Example 5, why is it unclear how one should define the process of 
interchanging row i and with column j (for 1 # j)? 


Hint: What will entry (1, 2) be when we try interchanging row | and 
column 2? 


Let 
] 
A=/]2 
E 


Determine 
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13. 


14. 


15. 


16. 


i 


(a) 3A (b) 2B (ce). =3B.  @a + 3B (e) 2A + 3B 
(f) 3A — 2B ; 


Let all matrices in this exercise be 4-by-4. Let I denote the matrix with 
1’s in the main diagonal and 0’s elsewhere. Let J denote the matrix 
with each entry equal to 1, and let A be the matrix 


Li he beta 
o. h4003 
cake) PM aR 
6) £ JOiet 


Express the following matrices as linear combinations of I, J, and A. 


(a) (b) (c) 


wm NN AD 
Nm NN A N 


5 
3 
| 
3 


—_ Ww Un Ww 
A WD = W 


3 
5 
3 


NAN N 
ON NY WH 
a 
or CO = 
- OF OC 
or Or 


Show that any vector x = [x,, x,| that is a multiple of [2, 1] (i-e., 
x = c|2, 1] for some c) satisfies the system of equations 


x, — 2x, = 0 
-2x, + 44, = 0 


Suppose in Example 6 that the final exam counted three times as much 
as an hour exam, so that the weights on the three tests should be 4, &, 
%, respectively. Recompute the course score matrix C with these 


weights. 


Write a computer program to add two matrices A and B, where both 
are m-by-n. Assume m, n given and that the entries of the matrices are 
stored in arrays A(I,J) and B(I,J). 


Write a computer program to read in scalars r and s and then compute 
the linear combination rA + sB of the m-by-n matrices A and B. 
Assume m, n given and that the entries of the matrices are stored in 
arrays A(I,J) and B(I,J). 


»Section-2.2. Matrix Multiplication 


In Section 2.1 we introduced vectors and matrices and showed how to add 
them and multiply them by a scalar. These two operations were obvious and 
straightforward, and accordingly they are not powerful tools. In this section 
we discuss multiplication of vectors and matrices. This operation is more 
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complicated but also more useful. It provides a simple notation for express- 
ing systems of linear equations and associated calculations. 

Consider the following typical situation, which requires vector multi- 
plication. We have a vector p of prices for a set of three vegetables. Suppose 
that p = [.80, 1.00, .50], where the ith entry is the price of the ith vegetable. 
We are also given a vector d of the weekly demand in a household for these 
three vegetables. Suppose that d = [5, 3, 4]. We shall define vector-times- 
vector multiplication so that p « d equals the cost of the household’s weekly 
demand for these three vegetables. In this case, 


p-d = [.80, 1.00, .50] - [5, 3, 4] 
805 + 1.00x3 + .50x4 


= 4.00 + 3.00 + 2.00 = 9.00 


Vector Multiplication 


Let a and b be two n-vectors, where a = [d,, a, ..., a,] and b = 
[b,, b3, . . ., b,|. Then the product a~- b, called the scalar product of a 


and b, is a single number (a scalar) equal to the sum of the products a,b,. 
That is, 


Vector multiplication a+ b makes sense only when a and b have the 
same length. 

The scalar product is also sometimes called the inner product or dot 
product (the latter term coming from the dot used in wmiting the product). 


An important geometric interpretation of scalar products is discussed in 
Chapter 5. 


1, Calculating Time to Process 
Computer Jobs 


A Superduper computer requires 3 minutes to do a type | job (say, a 
statistics problem), 4 minutes to do a type 2 job, and 2 minutes to do 
a type 3 job. The computer has 6 type | jobs, 8 type 2 jobs, and 10 
type 3 jobs. How long will the computer take to perform all these jobs? 
If t = [3, 4, 2] is the vector of the times to do the various jobs 
and n = |6, 8, 10] is the vector of the numbers of each type of job, 
the total time required will be the value of the scalar product t - n. 


Total time = t-n = [3, 4, 2] - [6, 8, 10] 
3x6 + 4x8 + 2x10 
= 18 + 32 + 20 = 70 Ti 


The key idea about a scalar product is: It is a linear combination of 
the entries in each vector. Any linear combination of variables or numbers 
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can be expressed as a scalar product. Consider the linear equation 
20x, + 4x, + 4x, = 500 


The left side is a linear combination of variables. If a = [20, 4, 4] and 
x = [x,, X>, x], the left side can be written as a scalar product 


20x, + 4x, + 4x, = [20, 4, 4] - [x,, x5, x,] = a-x 


Similarly, any linear equation or system of linear equations can be 
written in terms of scalar products. 


LTT 
Example 2. Representing the Refinery System 
of Equations 


Recall the system of equations for the refinery production problem in 
Section 1.2. 


20x, + 4x%,+ 4x, = 500 
10x, + 14x, + Sx 850 (la) 
Sx, + Sx, + 12x, = 1000 


Or making a vector of the quantities on each side of these equations, 


20x, + 4x, + 4x;] =] 500 
10x, + 14x, + 5x,]/ =] 850 (1b) 
Sx, + Sx, + 12x; ] = {1000 


Let A be the 3-by-3 matrix of the coefficients on the left side of the 
equations in (la) with row vectors a%, af, a. Let b be the right-side 
vector, and let x be the vector of unknowns. 


e120 4 4 500 x, 
A=aSli0 14 5], b=] 850], x=|x| (2) 
gees. Ss WD 1000 X 


The left sides of the equations in (1b) are a vector of scalar products 
of x with the rows of A: 


20x, + 4x, + 4x, [20, 4, 4]-x 
10x, + 14x, + Sx, }] = | [10, 14, 5]-x 
Sx, + Sx, + 12x, [5, 5, 12]-x e 
at -x 
=jat-x| = Ax 
ay x 
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As noted in (3), we call the result of multiplying each row of A times 
a vector x the matrix-vector product Ax. Thus in matrix notation, (1) 
is written simply Ax = b. a 


By treating a matrix as a set of row vectors, we can extend our defi- 
nition of vector-times-vector multiplication to matrix-times-vector multipli- 
cation. 


Matrix-Vector Multiplication 


Let A be an m-by-n matrix and b be an n-vector. Let a* be the ith row of 
A. Then the matrix-vector product Ab (the multiplication sign is normally 
omitted) is defined to be the column vector of scalar products a* - b: 


aX at -b 
Ab = | af |b = a+b 
: (4) 
aX a* -b 
For example, if 
=) OF 2 | 
A = Se ee and b= 12 
DS I 
then 
a EO 21 —xl + 0X2 +.2xX 1] | 
Ab = 2 i bY2)= 2%) + iX2 + XT =H 19 
a 30 Sed 


XD + SK2 HS 12 


What if we want to multiply a vector b times the columns of A? The 
convention is that when a vector b multiplies the rows of A, b is written to 
the right of A in the product, as in (4). When b multiplies the columns of 
A, then b is written to the left of A as in (5). The reason for this convention 
will become evident shortly. 


bA. = b° fat, af, ©. -; a9) (5) 
= [bw mS) Beas, cargoes 


For example, if we reverse the order of A and b in the previous 
computation of Ab, we have 


= Og 
A STR ZO) Za 
ae we 
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= [1x(-1) + 2X2 + 1x3, 1xK0O + 2x1 + 1X3, 
1x2 + 2x1 + 1X3] 
= [6, 5, 7] 


Remember that the length of b must equal the length of the rows of A in the 
product Ab. Similarly, the length of b must equal the length of the columns 
of A in the product bA. 


PRS TEN 
Example 3. Comparing Computations by 
Different Computers 


In Example | we computed how long it would take a Superduper 
computer to complete a set of jobs. There were three types of jobs and 
a vector t = [3, 4, 2] of times for Superduper to do each type of job. 
Suppose that we also have three other brands of computers, Wacko, 
Whooper, and Ultima, and for each there is a similar vector of times 
to do the jobs. Let us put all these vectors into a matrix A: 


Type of Job 
ogee 
Superduper|}3 4 2 
Wacko a ee 
= Wheepes a es Matrix of times 
Ultima SE 


In Example | we computed how long it would take a Superduper 
computer to do 6 type |, 8 type 2, and 10 type 3 jobs by forming the 
scalar product of the Superduper time vector [3, 4, 2] and with the 
number-of-jobs vector n = [6, 8, 10]. Now let us find out how long 
it would take each of the computers to do this set of jobs by multiplying 
each row of A times n, that is, by computing An. 


a Nee” P 3x6 + 4x8 + 2x10 70 
poe 7 eS 7 ge 5x6 + 7x8 + 3x10 bs 116 
200 1x6 + 2x8 + 1X10 32 
ee es 6s 3x6 + 3x8 + 3x10 72 


The final column tells us that the set of jobs takes 70 minutes for 
Superduper, 116 minutes for Wacko, 32 minutes for Whooper, and 
minutes for Ultima. 


TT, 
Example 2 (continued). Representing the Refinery 
Systems of Equations 


Let us quickly review our matrix notation for the refinery system of 
equations. 
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20x, + 46, + 4x%,] =] 500 
10x, + 14x, + 5x,] =] 850 
DX, + SX + 12x,.] = 11000 


Let A be the 3-by-3 matrix of the coefficients on the left side of the 


equations in (la), b be the right-side demand vector, and x be the 
vector of unknowns: 


2% 2) a 500 x, 
A=|{10 14 S|, b=] 850], x=|x, 
ss) 1000 x; 


The left sides of the equations are a vector of scalar products of x with 
the rows of A. This vector is simply Ax: 


20x, + 4x, + 4x, 20 4 4x, 
lOc, + 14x, + Sez [1 = 110 14 Siix|=Ax 6) 
SX, «Sx + Tx > S ION 


Note that we can write (6) as a weighted sum of vectors: 


20 4 4 500 
x) 10) +x] 6M] +x] 5] =] 850 (7) 
5 5 12 1000 


Or in vector notation, (7) becomes 


xa; + eas + x,af-=— b ia 


Observe how (7) views Ax = bin ) terms of columns, while (6) views 
Ax in terms of rows. 

We shall use the notation Ax = b for a system of equations over and 
over again herein. For another example we consider the system of equations 
in the Leontief economic model. 


Bvsaanle 4. Matrix Representation of Leontief 
Economic Model 


The supply—demand equations of the Leontief economic model in Sec- 
tion 1.2 can be written 


x, | = | 4x, + 2x, + .26, + .2x, | + 1100 
ei = 13x), + ote ces tla +7 50 (8) 
ky | =f 1%, + Ly + + .2x,| + | 100 
X,) = + At, oF Ax, + 0 
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Let x = [x,, X5, X3, X,] and let D be the matrix of coefficients on the 
right-hand sides of (8), and let c be the rightmost column vector of 
consumer demands: 


x, A 2 2 2 | F518 
= ; + 
X> Sy seas. coe XS 50 (9a) 
oe ie ow eee 21} x5} + | 100 
X,| = 1 x | + 0 
or 
x = Ax wo &- (9b) 


The system of equations in (8) can also be written as a linear 
combination of columns: 


x, 4 2 2 2 100 

x, a es) ee a A}, | 50 no 
= Xx xX xX x 

<1 ol Fe 4, Be *1 0 "8 33 100 

X4 0 | | 0 0) a 

SaaS ee ees 


Example 5. Matrix Notation for 
a Linear Program 


In Example 4 of Section 1.4 we presented the following linear program 
for maximizing revenue from planting two crops, corn (C) and wheat 


(W). 
Maximize 60C + 40W 
subject to C = 0, W = O and 
Land: C+ Ws 200 
Labor: 4C + Ws 320 (11) 
Capital: 20C + 10W s 2200 
If we let 
| | 200 
A = & wht. b= 320 
20 10 2200 


and ¢ = [60, 40], x = [C, W], the inequality constraints in (11) can 


be written 
l | C 200 
4 | =| 320 or Ax = b 


2200 
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In matrix notation, (11) can be written 
Maximize c * x (12) 
subject to x = 0 and Ax = b 
where 9 is a vector of all zeros. a 


AaaReae obser 2 
Example 6. Matrix Notation for Markov Chains 


In Section 1.3 we introduced the concept of a Markov chain and in 
Example 2 of Section 1.3 gave the following Markov chain for the 
random movements of a frog across a highway; the possible locations . 
for the frog were represented as states | through 6. The matrix of 
transition probabilities A was 


0 : 
A 
0) O «2Zd. AaW uta 
0 0 0 £9 3) Saw 
0) 0 0 a’ 92a Oe 
We let p = [p,, D>, . - -; Pe] be the current probability distribution 


vector (p; is the probability the frog is currently in the ith state) and 
p be the vector of the probability distribution in the next minute. We 


developed the following system of linear equations to determine p’ 
from A and p. 


p,; = .50p, + .25p, 

Pp; = 25p5, + .S0p, + .25p, (13) 
py = .29p, + .SOp, + .25p; 

ps = .25p, + .SOps + .50p¢ 

Pe = .25p5 + .50pD, 


p' so 35°.0. OO - Oe; 
ps 2.40 25 0 @ @ Ts», 
pl OG) 98550 OS 8 Gi: 
pon” WO) 4) 2 OS: OS: 
p: 6° 0° AGS S50. (50 pe 
De oO '. @ A) Se 


or 
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p = Ap (14) 


Note that each individual equation in (13) can be written as 


p; = a;"p (14a) 


Let us recall what (14) represents for a general Markov transition 
matrix A and initial distribution p. For simplicity, let A be 2-by-2. 


hie ik < “ae ig 
Gz, 42 P2 


Then p’ = Ap becomes 


= ke SAS a ee + ns 
P2 42; G2)LP2 A2,;P, + 22Pr 
Now that we have a concise way of writing Markov chain cal- 


culations, we can easily write equations to express the probability dis- 
tribution vector p” for the frog 2 minutes from now. 


p’ = Ap’ A(Ap) (15) 


— A’p 


In (15) we have rewritten AA as A’, just as one would with single 
variable. However, we have yet to define what the product of two 
matrices is. The first line of (15) says that to get p” we must multiply 
p by A twice. It should be possible to ‘‘multiply’’ A times A and then 
multiply the resulting A* times p to obtain p". We shall show how to 
do this matrix multiplication shortly. 

With this notation, we can write the probability distribution vec- 
tor p* for the frog 3 minutes from now as 


p® = Ap” = A(A?p) = A’p (16) 


Generalizing this formula, we find that the probability distribu- 
tion p’” for the frog in n minutes is given by 


p’” — A"p (17) 


Note how concisely we can write the complex calculations for p”, p®’, 
and p’” using matrix notation. It would be impossible to analyze prop- 
erties of Markov chains without such notation. & 


In Example 6 we wrote A* and other powers of matrix A without ever 
defining what matrix multiplication was. To the question as to how matrix 
multiplication is defined, our reply is that it should be defined to make matrix 
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multiplication a useful operation. In this instance it should be defined to 
make the formulas in Example 6 valid. 

The next example further motivates matrix multiplication and shows 
us how to do the computation. 


Example 7. A Collection of Computer 
Computation Times 


In Example 2 we computed the time it would take four different com- 
puters, Superduper, Wacko, Whooper, and Ultima, to do a set of jobs. 
We were given a matrix A telling how many minutes it took each 
computer to do each of three types of job. 


Type of Job 
aP va a 
Superduper|}3 4 2 
Wacko RY Meas : (18) 
- Whodver pera Matrix of times 
Ultima Says 


We calculated how long it would take each computer to do 6 type 1, 
8 type 2, and 10 type 3 jobs by multiplying A times the vector 


n = [6, 8, 10]: 
3 4 2]. 3x6 + 4x8 + 2x10 70 
FE g| . [5X6 + 7x8 + 3x10] _ 116 
Pe ata 1x6+2x8+1x10| | 32 
aes 3 3x6 + 3x8 + 3x10 72 
(19) 


Now let us do this calculation not for one set of jobs, but for three 
sets of jobs. Set A will be the previous set n = [6, 8, 10]. Sets B and 
C will be [2, 5, 5] and [4, 4, 4]. Let us calculate the times required 
to do each set on each computer by expanding the vector n in (19) 
into a matrix N of three column vectors. 


Sets of Jobs 


A Bik 
Typel1} 6 2 4 3 
N = Type2} 8 5 4 Matrix of jobs 
Type 3 L10 5 4 


The calculation of An in (19) required us to multiply each row of A 
times the vector n. Now we need to multiply each row of A (one for 
each computer) times each column of N (one for each set of jobs): 
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AN 


Wwo— Un Ww 
wn wa +> 
Ww — WwW ho 


3x6 + 4x8 + 2x10 
5x6 + 7X8 + 3x10 
1x6 + 2X8-+ 1x10 
3x6 + 3x8 + 3x10 


Sets of Jobs 


a? 8 Co 
Superduper| 70 36 36 
Wacko 116 60 60 
Whooper 32. 1¥ 6 
Ultima 72 36 36 


3X2 + 4x5 + 2x5 
5X2 + TKS + SxS 
1x2 +2x5 + 1x5 
3X2 +3x5 + 3x5 


Matrix of total 
computation times 


$1 


3x4+ 4x4 + 2x4 
5x4 +7x4+3x4 
1x4+2x4+ 1x4 
3x44+ 3x4 + 3x4 


(20) 


Formalizing the computation process in this example yields a method 
for extending matrix-vector multiplication to matrix-matrix multiplication. 


Matrix Multiplication. Let A be an m-by-r matrix and B be an 
r-by-n matrix. The number of columns in A must equal the number of 
rows in B. Then the matrix product AB is an m-by-n matrix obtained 
by forming the scalar product of each row in A with each column in 
B. That is, the (i, j)th entry in AB is af - b>, where af is the ith row 
of A and b§ is the jth column of B. 


AB 


ay 
ay 


~{bS, bF,. . 


af bf af - bs 


a®- bo af « bf 


a®§-be a®-bS 


c 
“Bs 


(A) 


§2 
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ey ae 2. “ete 
oe ere ~ Ree 


l-4+2-6 Seb ed 


if 


en aB=| 14 toe a We er cr, 


There are several ways to interpret matrix multiplication: first, 
as the scalar product of each row of A with each column of B, as in 
(A). Next, we can adopt the point of view of Example 7, where the 
product AB was an exension of the matrix-vector product Ab to the 
matrix-vector products of A with each column of B. 


AB = A[bS, bS,... ., b°] = [AbS, AbS,..., Ab¢] (B) 


| ] 4 
For A, B above, check that the first column of AB is ae , i 


Finally, we could also view AB as an extension of the vector-matrix 
product aB to the vector-matrix products of B with each row of A. 


s; a<B 
as aXB 

AB = - (C) 
a: aXB 


a 
For A, B above, check that the first row of AB is [1 1p | 


Equivalent Definitions of Matrix Multiplication AB 


(A) Entry (i, j) of AB is scalar product af - bf. 
(B) Column j of AB is matrix-vector product Abf. 
(C) Row i of AB is vector-matrix product a*B. 


Remember that for the matrix product AB to make sense, the length 
of the rows in A (= the number of columns in A) must equal the length of 
the columns of B (= the number of rows in B). Further, if A is m-by-r and 
B is r-by-n, then AB is an m-by-n matrix: AB has as many rows as A and 
as many columns as B. 

We shall see shortly that this form of matrix multiplication is exactly 
what is needed for Markov chain calculations. If this definition of matrix 
multiplication were given to a reader out of the blue, it would probably seem 
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quite strange and artificial. But from Example 7 and dozens of other ex- 
amples throughout this book we see that this strange definition is the ‘‘nat- 
ural’ definition to give. 


SRE 
Example 8. Matrix Multiplication Example 


ce 3 de ale - 
Lta=|' ay 0 | Then AB is 


an [NES 
: a 4x1 + 5x0 spit ose 


pe 7X3\. 6X1+7xX0 6x2+7x!1 


\ 
_ [23 13 
~ 133/6 19 - 


Note that the order of the matrices in matrix multiplication makes a 
big difference. That is, if A and B were two square n-by-n matrices, the 
matrix products AB and BA would yield different results (except in unusual 
cases). In mathematical terms, we say that matrix multiplication is noncom- 
mutative. 


Bie te pe 
Example 9. Matrix Multiplication Is 
Not Commutative 


Ni OF i =f 
tA = ie é 
Le N 4 and B = Sat 4 Then 


rm yx KPOXO Ix(-1)+ 0x2] [1 -1 
3xVrR1IRda = 3x(-1) +12] 13 -1 


a 1X1 + (-1)x3 4x04 (-1)xI| [-2 -1 
Ox1 + 2x3 0x0 + ae 


Thus AB + BA. a 


Matrix multiplication is clearly quite tedious. It is casy to make a 
mistake and multiply the wrong entries together. But with three simple loops, 
a short computer program for matrix multiplication can be written to do the 
work for you. (This is a beautiful example of the advantage computers have 
in speed and accuracy for doing repetitive arithmetic.) Assume that A is an 
m-by-r matrix, B is r-by-n, and so the product C = AB is m-by-n. 
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FOR I = 1 TOM 
FOR J = 1 TON 
FOR K = | TOR 
C(1,J) = CJ) + ACU,K) * B(K,J) (21) 
NEXT K 
NEXT J 
NEXT I 


We assume in this program that C(I,J) = 0 initially; otherwise, the statement 
C(1,J) = O must be inserted just after FOR J = 1 TON. 
Stace abu = a Ea] 
Example 10. Powers of Markov Chain 
Transition Matrices 
Let p denote the vector of the current probability distribution. In Ex- 
ample 6 we showed that the system of equations to compute the prob- 
ability distribution vector p’ for the next period can be written as 


p' = Ap, and that the probability distribution vector p” after 2 minutes 
is 


p" = Ap’ = A(Ap) 2 A’p (22) 
Similarly, the distribution p™ after three periods is 
p® = A(A(Ap)) 2 A’p 


The 2 means that the step is yet to be proven. In (22), we want A’*p 
to be the same as A(Ap)—that is, premultiplying p by A? should be 
the same as premultiplying p twice by A; and similarly for A’. 

Let us first compute A* and A? for the weather Markov chain 
introduced in Section 1.3. 


Sunny Cloudy 
_ Sunny $ $ 
~ Cloudy} 4 3 


Then 


$x$ +3x4 §X5+ 9X4 
-|i ati 


Xx3+4x}i  4x$+ 4x} 
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1 11 5 

2] |16 8 

f { 

: be +4xfe 0 EX$ + a 
XH +4xde xb + Axe 


(ty 
HOW 
The entries in A? will be transition probabilities for two periods 
and the entries in A? transition probabilities for three periods. For 
example, the value 7% in entry (2, 1) of A? should mean that if we are 
now in state 1 (Sunny), the chance is 7 that in 2 days we will be in 
state 2 (Cloudy). The value of # in entry (2, 1) of A? tells us that if 
now Sunny, the probability is #4 that in 3 days it will be Cloudy. 
The values we obtained in computing A* and A? look reasonable. 
In particular, the numbers in each column of A? and A? sum to |. @ 


Next we compute A’: 


vba pmt del 


err 


(24) 


We must check that matrix multiplication has given the correct prob- 
abilities in A? and A? in (23) and (24). In general, entry (i, 7) in A? should 
be the probability that if we are now in state j, then in two periods we shall 
be in state 7. By the scalar-product definition of matrix multiplication, 


aS at ous) 6 aj ar ht a®- af 
an ay e's 0 ay ar bast a® - af 
AA = oe . Piet. (25) 
Bayi: Wang es Qe) 48) & os R.eagf 
a), a; a), ai a), a), 
where 
a® r at = Aj}; + A j9Q9; js ee. + AinGAy) (26) 


In the case of a 2-by-2 transition matrix, (25) 1s 
ie og be > io ee F y7Az, Ay Ayn + = 
4, 422)(421 422 €2,4,, F 4242, Az Ay F Az7Q7 
In words, we interpret (26) as follows: The probability of going from 
state j to state i in two periods is obtained by finding the probability of going 
from state j to state | (in the first period) and then from state | to state i (in 
the second period), plus the probability of going from / to 2 and then from 


2 to 7, plus from j to 3 and then from 3 to i, and so on. This is exactly the 
probability of going from state j now to state 7 in two periods. 
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This argument extends to show that A? is the three-period transition 
matrix and A* is the k-period transition matrix. 


" May ’s 
| - ete 
pee 


Example 10 


ontinued). Powers of Markov 
Transition Matrices 


Let us turn now to the frog Markov chain. Recall that the transition 
matrix is 


A’. 


and 


.062 


.016 
.Q93 
.234 
a 
.250 
.094 


(27) 


Using the computer program given above, one can compute A? and 
3 


(28) 


(29) 


Looking at entry (2, 1) in A? and A®, we see that if the frog were now 
in state 1, then in 2 minutes the frog has probability .5 of being in 
state 2 and in 3 minutes it has probability .468 of being in state 2. & 


or 


Optional Example on Linear Filtering (Revisited) 


In Example 4 of Section 1.5 we applied linear filtering to a noisy 
pattern of darkness levels that might have come from the TV eye of a 
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robot. The array of readings, numbers between O and 8, represented 
levels of darkness. The initial array is given below. First we give 
the matrix D of darkness levels and beneath the array of darkness 


characters. 
ts ew Se Bees 2 rae eG 
ae es Soe Bien 3S PL Sa 2 
aay a ae be Le ee Se 
7. ee OF Ga Ch! Be 2 eh) Se 
8. dt det RS OP Go a Se 
Bo PO "Be eo, BOG Pe Be 4, 2 
D= = (30) 
"Oe aks eT ak alee Be mer es i 
2: ae Gee ote Bn ee ah ON ae 
FS eR ae San FT PO AS 
b Oh ee OS ee Ss ee Se ee 
SET Oy. 2 PaaS 
2. Be BOS ot Si BP Oo ee oe 
some = 
oo 


nore. here 

ses Ke tyr ot 
ssa0 8 “o> » 

sieht 2 et ees 


gaseeeas Nene tees 
aiisers sHesste Aafurery 
senate ThErr- yy 
Falndede ‘mane Fefoges F 
‘ PeQeeree Soe pe . 
tre sesadery bettrate 
irtarat geasees 
“** oeteanee Laryete 
es+* , oe ‘ . 
eee “ee raehety jue et ve ore 808% 
‘ ‘ RRR " figs eo weet 
see <* , uA 
oeneeee baad 
raat aa 2 me zor 
Ee nite Bann Hit =: 
eateries we eww 3 t 
qetecons a trees hPe trees ree “eee r] 
sees “ities Sar Sts" vss: Heame 
Lia, RAKE Seb Se ‘ 


In Section 1.5 we obtained a filtered matrix D’ (Data Plot 5) by 
applying linear filtering to D. We computed each entry d;, in D’ in- 
dividually from the weighted-average formula 


361 16d, + Ad; 5) + dija, + G1; + Ga1.;) 
+ I(d;_ ;-| + a 2 diy 44 + di+itj+v3 (31) 
Now we shall give a 12-by-12 filtering matrix F that performs the 


filtering (31) on the whole darkness matrix D at once using matrix 
multiplication. 
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e 2° Oe Ooh obs Be Oe OG 
re © £0 OO GeO Oo 6 2 0 
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We claim that the filtered matrix D’ equals the product matrix 
D' = FDF (33) 


A closer examination of why (33) is true and computation with (33) 
is left to the Exercises. Bi 


Section 2.2 Exercises 


Summary of Exercises 

Exercises 1-15 deal with matrix-vector products. Exercises 16-34 deal with 
matrix multiplication; Exercises 29-34 look at some general classes of matri- 
ces. Exercises 35-39 involve writing or using computer programs. Exercises 
40 and 41 cover material in the optional final pages of the section. 


1. Leta = [1, 2, 3], b = [—1, 3, —1], e = [2, 5, 8]. Compute 
(a) a:b (b) b-c (c) a: (b + c) (d) a-a 
(e) Show that for any a, a~ a is the sum of squares of entries. 


2. In Example 2 of Section 1.5 we smoothed a time series with the 
transformation d; = (d;_, + d, + d,,,)/3. If d; is the vector d, = 
[d;_,, d;, d;,,], define a vector ec so that d' = ce: d.. 


3. Let a, b, c be as in Exercise 1. Let 


5 4 
b. BAS Ue I On J ars 
A=/]2 4 6 8|, B=]2 -2 ft een er ee 
SS" 2 0 ea 
Ob es 


Which of the following matrix calculations are well defined (the sizes 
match)? If the computation makes sense, perform it. 
(a) aA (b) bB (c) cC (d) Aa (e) Bb (f) Ce 
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4. Calculate the following expressions, unless the sizes do not match. The 


Le 


vectors a, b, c are as defined in Exercise |, and matrices A, B, C are 
as in Exercise 3. 
(a) (aB)-c (b) (a + @)A (c) b(A + B) 


Suppose that you want to buy 5 cantaloupes, 4 apples, 3 oranges, and 

2 pineapples. You comparison shop and find that at store A the costs 

of these four fruits, respectively, are 30 cents, 10 cents, 10 cents, and 

75 cents a piece, while at store B the costs are 25 cents, I5 cents, 8 

cents, and 80 cents. 

(a) Express the problem of determining the cost of this set of fruit at 
each store as a matrix-vector product; write out the matrix and 
vector. 

(b) Compute the costs of the fruits at the two stores. 


. Suppose that you want to have a party catered and will need 10 hero 


sandwiches, 6 quarts of fruit punch, 3 quarts of potato salad, and 2 
plates of hors d'oeuvres. The following matrix gives the costs of these 
supplies from three different caterers. 


Caterer A Caterer B Caterer C 
Hero sandwich $5 $5 $4 
Fruit punch $1 $1.50 $.75 
Potato salad $.75 $1.00 $] 
Hors d'oeuvres $8 $7 $10 


(a) Express the problem of determining the costs of catering the party 
by cach caterer as a matrix-vector product (be careful whether you 
place the vector first or second in the product). 

(b) Determine the costs of catering with each caterer. 


Write the following systems of equations in matrix notation. Define any 

matrices or vectors you use. 

(a) 3x, + 4x, = 5 (b) 2x, + x, — 2x, = 0 (cC) x, = 2x, -— x, 
2X, -— 3% = 3 x + 3x; SE 2X5 
| 3x, = x; =5 x, = 4x, — 3x, 


3 X> 


Write the following linear programs in matrix notation. Define any 
matrices or vectors that you use. 


(a) Maximize 3x, + 10x, (b) Minimize 5x, + 5x, + 5x, 
| OS ee Teh =O, x, 20 

2X, + & S20 x, + Ze +. 3k, = 20 

xX + 3.2 10 Bh, As 4 25 25 


4x, + 2x5 = 35 2X = Xs + 3X SYS 
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. Write the rabbit-fox equations 


Ro = RK + IR S6iSP 
Po ose Pe 2 Se 


in matrix form using p = ([R, F], p’ [R', F’'], and 
wis = 15 

A ee 1 
Consider the system of equations 


2X, - 3X5 - 2X —_ DY a 2y> = 3y3 + 200 
x | = is 4x, 9 3x 3 = 6y | = 4y, + 4y, = | 20 
5X, + 2% — Xs = 2y, — 2y, + 3350 


(a) Write this system of equations in matrix form. Define the vectors 
and matrices you introduce. 

(b) Rewrite in matrix form with all the variables on the left side (and 
just numbers on the right). 


Three different types of computers need varying amounts of four dif- 
ferent types of integrated circuits. The following matrix A gives the 
number of each circuits needed by each computer. 


Circuits 
Lb <2) 3) 4 
AEE, a eee 
A = Computers B]5 1 3 2 
Cl i. 2 2 


Let d = [10, 20, 30] be the computer demand vector (how many of 
each type of computer is needed). Let p = [$2, $5, $1, $10] be the 
price vector for the circuits (the cost of each type of circuit). 

Write an expression in terms of A, d, p for the total cost of the 
circuits needed to produce the set of computers in demand; indicate 
where the matrix-vector product occurs and where the vector product 
occurs. Compute this total cost. 


For the frog Markov chain in Example 6 it was noted in Section 1.2 
that p* = [.1, .2, .2, .2, .2, .l] is a stable distribution. In matrix 
algebra, this means that p* = Ap*, where A ts the frog Markov tran- 
sition matrix. Verify that p* = Ap* for this p*. 


One can express polynomial multiplication in terms of a matrix-vector 
product as follows: to multiply the quadratic 2x7 + 3x + 4 by 
lx? — 2x + 5, we multiply 
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14, 


15. 


16. 


17. 


2 0.4 
o, 27 9 | 
43S 2 =o 
OY aS 5 
0 0 4 


The resulting vector will give the coefficients in the product. Confirm 
this. For the polynomial multiplication (4° + 2x* + 3x. + 4) x 
(4x* — 3x + 1), write the associated matrix-vector product. 


Let 1 denote a vector of all 1’s. Show for a Markov transition matrix 


A that 1- af = | (where af is the jth column of A). Then show that 
1A = 1. 


In the problem of smoothing a time series introduced in Example 2 of 
Section 1.5, we start with a time-series vector d of data values. We 
want to smooth the time series d into d’ with the transformation on the 
entries d; = (d,_, + d; + d,,,)/3. Show that d’ = Sd, where S is 
the matrix 


+ 4+ 00 0 
$3 3 00 
0% 3 3 0 
60% % 4 


Indicate which pairs of the following matrices can be multiplied together 
and give the size of the resulting product. 
(i) A 3-by-7 matrix A 
(ii) A 2-by-3 matrix B 
(iii) A 3-by-3 matrix C 
(iv) A 2-by-2 matrix D 
(v) A 7-by-2 matrix E 


I 2 ore 0 ee 
= , = e = . t 
Let A F ‘ B F 10 ] C f s Compute the 


following matrix products (if possible). 
(a) AB (b) AC (c) BC (d) CA (e) (CA)B 


18. Let 
s te | 
rr ge 1 oO -!] ; 9 
A=i1246 81, B=]2 -2 Bree ON 4 ao 
1 -1 
Mw ee ae 9 0 94 
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Compute these matrix products (if possible). 
(a) AB (b) BA (c) AC (d) CA (e) CB 


Compute just one row or column, as requested, in the following matrix 
products (A, B, C are as in Exercise 18). 
(a) Row 1 in B? (b) Column 2 in AC (c) Column 3 in CB. 


Show that AB = BA for the matrices 
i if ‘3 -!1 
A= and B = 
aes —2 ] 
(Normally, matrix multiplication does not commute.) 
For A, B, C in Exercise 18, compute entry (2, 3) in (BA)C. 


Suppose that we are given the following matrices involving the costs 
of fruits at different stores, the amounts of fruit different types of people 
want, and the numbers of people of different types in different towns. 


Store A Store B Apple Orange Pear 
Apple 10 Sh Person A 5 10 3 
Orange 15 .20 Person B 4 5 5 
Pear 10 10 


Person A’ Person B 
Town | 1000 500 
Town 2 2000 1000 


(a) Compute a matrix that tells how much each person’s fruit purchases 
cost at each store. 

(b) Compute a matrix that tells how many of each fruit will be pur- 
chased in each town. 

(c) Compute a matrix that tells the total cost of everyone’s fruit pur- 
chases in town | and in town 2 when people use store A and when 
they use store B (a different number for each town and each store). 


Express in matrix notation the following operations on these arrays of 
data: Matrix A gives the amount of time each of three jobs requires of 
I/O (input/output), of Execution time, and System overhead; matrix B 
gives the charges (per unit of time) of different computer activities under 
two different charging plans; matrix C (actually a vector) tells how 
many jobs of each type there are; and matrix D tells the fraction of the 
time that each time-charging plan (the columns in matrix B) is used 
each day. 
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Time Time Charges 

A I/O Execution System B PlanI Plan II 
Job A 5 20 10 1/O 2 3 
Job B 4 25 8 Execution 6 5 
Job C | 10 10 5 System 3 4 

Number of Jobs of Fraction 
C Each Type D of Time 
Job A 4 Plan | ry 
Job B 5 Plan U SF 
Job C | 3 


Compute the following arrays using A, B, C, and D. 

(a) Total cost of each type of job for each charge plan. 

(b) Total amount of I/O, Execution, and System time for all the jobs 
(all jobs are summarized in matrix C). 

(c) Total cost of all jobs when run under plan | and under plan II. 

(d) Average cost of | unit of I/O, of Execution, of System time. 
Hint: Use matrix D. 

(e) Average cost of each type of job (job A, job B, job C). 


. Express in matrix notation the following operations on these arrays of 
data: Matrix A gives the amounts of raw material required to build 
different products; matrix B gives the costs of these raw materials in 
two different countries; matrix C tells how many of the products are 
needed to build two types of houses; and matrix D gives the demand 
for houses in the two countries. 


Raw Material 
A Wood Labor Steel B 
Item A 5 20 10 
Item B 4 25 8 
Item C 10 10 5 


Cost by Country 
Spain Italy 
Wood $2 $3 
Labor $6 $5 
Steel $3 $4 


Items Needed in House Demand for Houses 


om Item A ItemB Item C D House I House II 
House | 4 8 3 Spain [ 50,000 200,000 
House II 5 5 2 Italy 80,000 300,000 


(a) Compute the first row in the matrix product AB. 

(b) Which matrix product tells how much of the different items are 
needed to meet the demand for houses (types I and II combined) 
in the different countries? 

(c) Which matrix product gives the cost of building each type of house 
in each country? 
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(d) Which entry in what matrix product would give the total cost of 
building all homes in Spain? 


Note: If the rows and columns in a matrix must be interchanged 
in a product, indicate this by using the transpose of the matrix 
(transposes are not formally introduced until Section 2.4). 


Express in matrix notation the following operations on these arrays of 
data: Matrix A gives the number of tradesmen needed each day to build 
different types of small stores; matrix B gives the number of days it 
takes to build each type of store in each state; matrix C gives the cost 
of tradesmen (per day) in New York and Texas; and matrix D gives the 


number of stores of each type needed in two different sorts of shopping 
centers. 


A Carpenter Electrician Bricklayer 
StoreA |. 5 2 l 
Store B 4 2 2 
Store C 3 l l 
B New York Texas 
Store A 20 15 
Store B 30 25 
Store C 20 20 

Shopping Shopping 
c New York Texas D Center I Center Il 
Carpenter $100 $60 Store A 10 5 
Electrician $80 $50 Store B 10 10 
Bricklayer $80 $60 Store C 20 20 


(a) Compute the first column in AC. 

(b) Which matrix product tells how many tradesmen per day are needed 
to build the stores in each type of shopping center? Do not compute 
this product. 

(c) Which entry in what matrix product would give the total cost of 
building three stores, one of each type, in New York? (Total cost 
covers all the days of construction. ) 


Note: If the rows and colunns in a matrix must be interchanged in 
some matrix product, indicate this by using the transpose of the 
matrix (transposes are defined in Section 2.4). 


Consider a growth model for the numbers of computers (C) and dogs 
(D) from year to year: 

DP 30 +” DD 

C’ = 2C + 2D 
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Let x = [C, D] be the initial vector and let x“ denote the vector of 
computers and dogs after k years. Let A be the matrix of coefficients 
in this system. Write x“ in terms of A and x. 

Consider a variation on the Markov chain for Sunny and Cloudy 
weather. The new transition matrix A is 

Sunny Cloudy 

_ Sunny 5 
Cloudy 3 


(a) Compute A?. What probability does entry (1, 2) in A* represent? 

(b) Compute A*. What is the probability if sunny today that it is sunny 
in 3 days? 

(c) Compute A*. What vector do the columns of A*, for k = 2, 3, 4, 
seem to be approaching? 


iy Wl 


. (a) Show that if we multiply any 3-by-3 matrix A by 


0 0 
LG 
| ae 


then the result IA (or AI) always equals A. 
Hint: Make up a 3-by-3 matrix and multiply it by I. 
(b) Show that if we premultiply A by 


 @& @ 
K=]0 2 0 
0. 33 


the result is A except that the second row of A is doubled and the 
third row of A is tripled. 

(c) Suppose that K has k, in entry (1, 1), k, in (2, 2), k3 in (3, 3), and 
0’s elsewhere. Describe the effect of premultiplying any 3-by-3 
matrix A by K. 


A square matrix is called diagonal if all its nonzero entries are on the 
main diagonal. 


(a) Find the product AB of 


20 x 0 
A=]0 1 O and B=]0 2 
yk. 3 GO: @ 5 
(b) Suppose that a,,, a5, a,, are the diagonal entries in the diagonal 


matrix A and b,,, b5,, b3; are the diagonal entries in the diagonal 
matrix B. Then what are the diagonal entries in AB? 
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30. (a) If we premultiply any 3-by-3 matrix A by 


31. 


(b) 


(c) 


(d) 


(e) 


(a) 


(b 


— 


(c) 


(d) 


(e) 


— © © 


| 0 
Q=1|0 l 
0 0 
show that the resulting matrix QA is just A with the second and 
third rows interchanged. 
Hint: To see why this is so, make up a 3-by-3 matrix and pre- 


multiply by Q. 
If we premultiply any matrix A by 


= 
\| 
—_— © © 


0 1 
2 0 
0 0 
show that the result RA is a matrix with the rows of A reversed 


and the values of all entries in the second row doubled. 
If we premultiply any matrix A by 


l 0 
S=j]-2 0 
0 l 


oOo =_— © 


show that the result SA is A except that twice the first row has 
been subtracted from the second row. 

Construct a 3-by-3 matrix which when premultiplying a 3-by-3 
matrix has the effect of adding four times the third row to the first 
row. 

Construct a 3-by-3 matrix which when premultiplying a 3-by-3 
matrix has the effect of subtracting twice the first row from the 
second row and also adding three times the first row to the third 
row. 


Show that if we postmultiply any 3-by-3 matrix A by the matrix 
Q in Exercise 30, the resulting matrix AQ is A with the second 
and third columns interchanged. 

Show that if we postmultiply A by R in Exercise 30, the resulting 
matrix AR is A with the columns reversed and the values of all 
entries in column two doubled. 

Show that if we postmultiply A by S in Exercise 30, the resulting 
matrix AS is A except that twice the second column has been 
subtracted from the first column. 

Construct a 3-by-3 matrix which when postmultiplying a 3-by-3 
matrix has the effect of adding two times the third column to the 
second column. 

Construct a 3-by-3 matrix which when postmultiplying a 3-by-3 
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33. 


matrix has the effect of subtracting four times the first column from 
the second column and also adding twice the first column to the 
third column. 


Compute the matrix product 


— NOK © 
Ww= oO - 


| 
2 
l 
0 


oO WH 
seo o 8 
oor °o 


2 
I 
0 
0 


oOo G2 a = 


A matrix is called upper triangular if the only nonzero entries are on 
or above the main diagonal. The previous computation illustrated the 
fact that the product of two upper triangular matrices is again upper 
triangular. Give an explanation of why this is true for the product of 


any two 4-by-4 upper triangular matrices (or more generally for any 
size matrices). 


b 
Show that in a 2-by-2 matrix ‘ 4 , if one row is a multiple of the 
c 


other row, then one column is a multiple of the other column. 


. Let A and B be 2-by-2 matrices. If C = AB and the second row of A 


is 3 times the first row of A (af = 3a‘), show that the second row of 
C is 3 times the first row of C. 


Hint: Compare c,, = af - bf with c,, = af - bf; similarly for c,, 
VETSUS C55. 


Exercises Involving Computer Programs 


5 = 


37. 


38. 


Write a program to multiply a matrix times a vector, take the resulting 
vector, and premultiply it again by the matrix, and so on a specified 
number of times. (This is the computation needed to follow a Markov 
chain over many periods.) 


. Write a program to read in two matrices and multiply them together. 


Write a program to raise a (Square) matrix to a specified power. 


Use the program you wrote in Exercise 35 (or one supplied by the 
instructor) to compute successive probability distributions for 20 periods 
for the following Markov chains. Unless otherwise specified, assume 
that the initial distribution vector is p = [1, 0,0, . . .] (i.e., one starts 
in state 1). 

(a) The weather Markov chain (in Example 10). 

(b) The frog Markov chain starting with p = [0, 0, 4, 4, 0, OJ. 

(c) The rat maze Markov chain in Exercise 8 of Section 1.3. 

(d) The poker Markov chain in Exercise 9 of Section 1.3 with p = 
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(0, 0, 0, 1, 0, O, OJ—you start with $3. What are your chances of 
winning $6 by the end of 20 periods’? 

(e) The tank battle Markov chain in Exercise 10 of Section 1.3. What 
are each tank’s chances of winning the battle? 


39. Use the program you wrote in Exercise 37 to raise the following Markov 
transition matrices to the twentieth power. In each case explain the 
pattern of values in the matrix. 

(a) The weather Markov chain (in Example 10). 

(b) The frog Markov chain. 

(c) The rat maze Markov chain in Exercise 8 of Section 1.3. 
(d) The poker Markov chain in Exercise 9 of Section 1.3. 

(e) The tank battle Markov chain in Exercise 10 of Section 1.3 


Exercises for Optional Part of Section 
40. Compute entry (3, 3) in the matrix product FDF in Example 1!1, and 
verify that it agrees with the value given in Data Plot 5 of Section 1.5. 


41. (Difficult) (a) Verify that the matrix product FDF performs the filtering 
given by formula (31) in Example 11 [e.g., that entry (i, j) equals that 
formula]. 

(b) Build a filtering matrix G such that GDG performs the filtering 
transformation 


di; = 164d, + 2d, ;-, + dij. + di-a5 + 4is1,;) 
+ Mics 2) Pe Ganyer * Geyjer? diss j+ut 
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Most of the mathematics that is studied in high school involves numbers or 
geometric objects. However, there are important fields of discrete mathe- 
matics that work with sets of nonnumeric objects. One such field is graph 
theory. These graphs are different from the graphs for plotting functions. 
Graph theory is used extensively in computer science and systems analysis. 

A graph G = (N, £) consists of a set N of nodes and a collection E 
of edges that are pairs of nodes. There is a natural way to “‘draw’’ a graph. 
We make a point for each node and draw lines linking the pairs of nodes in 


the edges. For example, the graph G with node set N = {a, b, c, d, e} and 


edge set E = {(a, b), (a, c), (a, d), (b, c), (b, d), (d, e), (e, e)} is drawn 
in Figure 2.2. An edge may link a node with itself, called a loop edge, as 
at node e in Figure 2.2. 


Figure 2.2 
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A flowchart for a computer program is a form of graph. The data 
structures that are used to organize complex sets of data are graphs. Or- 
ganizational charts, electrical circuits, telephone networks, and road maps 
are other examples of graphs. Billions of dollars are spent every year ana- 
lyzing problems that are modeled in terms of graphs. 

One class of questions asked about graphs concerns paths. A path is 
a sequence of nodes with edges linking consecutive nodes. We may want 
to find the shortest path between two nodes, or determine whether or not 
any path exists between a given pair of nodes. Finding paths through graphs 
arises when one wants to route a set of telephone calls through a network 
between prescribed cities without exceeding the capacity of any edge. The 
question of whether a path exists between two nodes arises over and over 
again in studying the effect on networks of random disruption, say, due to 
lightning. For example, in a given 1000-edge network one might want to 
know the probability that if five randomly chosen edges are destroyed, the 
network will become disconnected. 

The purpose of mentioning all these graph problems is to motivate the 
importance of having good methods to represent and manipulate graphs in 
a computer. We frequently use matrices for representing graphs. 


Adjacency Matrix A(G) of a Graph G. A(G) tells which pairs of nodes 
are adjacent (i.e., which pairs form edges). Entry a,;, = 1 if there is 


an edge linking the ith and jth nodes; otherwise, a, = 0. Note that 
a,;, = O unless there is a loop at the ith node. 


The adjacency matrix A(G) of the graph G in Figure 2.2 is 


ae BS Ace 

GigE” Fe & QD 

prt @ & Ff 0 
A(G)=c]1 1 0 0 O At) 

ert. oe Od 

oO Ue 1 A 


Matrix A(G) is symmetric; that is, a, = 4;;. 
Let us see how a question about graphs can be solved in terms of this 
matrix. 


Example 1. Paths in Graphs 


A path is a sequence of nodes such that each consecutive pair of nodes 
in the path is linked by an edge. The /ength of a path is the number 
of edges along it. For example, in Figure 2.2, (a, b, d, e) is a path of 
length 3 between a and e. A single edge is a path of length 1. 
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We claim that the ith and jth nodes can be joined by a path of 
length 2 if and only if entry (i, /) in A*(G), the square of the adjacency 
matrix A(G), is positive. 

First let us compute A*(G) for the graph in Figure 2.2. To do 
this, we must find the scalar product of each row of A(G) with each 
column of A(G). Since A(G) is symmetric, this is equivalent to finding 
the scalar product of each row with every other row. (Why?) Consider 
the scalar product of a’s row and b’s row in A(G): 


ag: a. tie k 
BS ES i eas Cee. Mle 
" 4 . A - (2) 


O41, 1, 0F- 10, 14,6] = O%8F + XO LX D+ 
Ixl+0Ox0=2 


The product of two entries in this scalar product will be | if and 
only if the two entries are both 1. Thus the value of the scalar product 
is simply the number of positions where the two vectors both have a 
1. With this observation, it is easy to compute all the scalar products 
that form the entries of A7(G). 


(3) 


—_ — — Ff) W & 
mo NN — — 


d 

a l 
b 

A*(G) =c 2 
d 3 

e | 


0) 


We now interpret the computation of the scalar product (2) in 
terms of adjacencies in the graph. In (2), when rows a and b have a 
1 in c’s column, this means that a and b are both adjacent to c. From 
(2) we see that a and b are both adjacent to nodes c and d. In general, 
when two nodes n; and n; are adjacent to a common node n,, then 
(n;, Ny, n;) will be a path of length 2 between n; and n,;. This proves 
that the (i, 7) entry in A?(G) equals the number of paths of length 2 
between the ith and jth nodes. 

This property extends to higher powers of A(G). The entries of 
A°?(G) tell how many paths of length 3 join different pairs of nodes. 
For any positive integer m, the entries of A’"(G) tell how many paths 
of length m join different pairs of nodes. Illustrative examples and 
mathematical verification of this property of A’(G) are left to the 
exercises. 5 


A graph G is connected if every pair of nodes in G are joined by a 


path. Using powers of A(G), we can determine whether or not a graph is 
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connected. If G has n nodes, any path between two nodes in G has length 
=n — |. So G is connected when there exist paths of length <n — | 
between all pairs of nodes. To determine whether all such paths exist, we 
compute A’, A’, . . ., A” '(G) and check for each (i, j) pair, i ~ j, whether 
entry (i, j) is positive for some power of A. For example, the graph G in 
Figure 2.2 is seen to be connected; all entries but (c, e) and (e, c) are positive 
in A?(G), and these two zero entries become positive in A*(G). 
Summarizing our discussion of paths and connectedness, we have 


Graphs and Matrix Multiplication 


1. Let A(G) be the adjacency matrix of graph G. Then the entry 
(i, j) in A?(G) tells how many paths of length 2 join node i with 


node j, and more generally, entry (7, 7) in A”(G) tells how many 
paths of length m join node i with node j. 

. Let G be an n-vertex graph. G is connected if and only if for each 
(i, j) pair, i ¥ j, entry (i, j) is positive in some power A‘, k = 1, 
Reema er Sot 


If D is a directed graph (in which an edge (a, b) goes only from a to 
b), its adjacency matrix A(D) has a | in entry (i, /) if there is an edge from 
node j to node 7. Then (i, 7) entry in A”(D) will tell how many directed 
paths of length m there are in D from node / to node /. 

There is one important scalar product which we shall use in Example 
2 that merits discussion. If b is some vector b = [b,, b,, . . ., b,] and 1 ts 
a vector of n 1's, the sum of the b, can be written 


> 6b, = b, + Bs + +> + DB, 
= 1X), Fil KB t oo + PX BD, 
=1-b 


Example 2. Ranking Track Teams 


Suppose that there are five track teams, named Ants (A), Birds (B), 
Cats (C), Dogs (D), and Elephants (£), that compete in nine meets. 
The results of the competition are modeled by a directed graph G with 
nine edges, in which there is an edge from A to B if A beat B (see 
Figure 2.3). 

One approach, the one we will use, is to give two points to team 
A for each team that A beats and to give one point to team A for each 
case where team A beats a team, say D, that in turns beats another 
team, say B (this way, team A gets several points when A beats a 
‘*sood’’ team that has beaten other teams). In the directed graph, node 
A gets two points for each edge directed out from A and gets one point 
for each path of length 2 directed out from A. Recall that the number 
of paths of length 2 from node j to node i is entry (i, j) in A*(G). 
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Figure 2.3 [Mee eS 
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The question is: How to rank the five teams? 

If 1 denotes a vector of five 1’s and 2 denotes a vector of five 
2's, then team A’s total score will be the scalar product of 2 times A’s 
column in A(G) plus the scalar product of 1 times A’s column in A°(G). 
To get the scores of all the teams at once, we multiply 2 times all 
columns of A(G)—that is, compute 2A(G)— and ad4d it to 1 times all 
columns of A*(D)—that is, compute 1A?(G). Our computations yield 
(here we have already computed A?(G) on a computer): 


mu} 
0 0 
2A(G) + 1A*7(G) = [2 2 2 2 2]/0 O 
; v 
l 


a 2 & = & 
eo - —- © O&O 


—_— —_—_-—  - — — 
—_-_ O&O Oo CO = 
Oo = © NN mo 


= (2, 4, 6, 2, 4] + [1, 3, 4, 2, 4] 


ADS CE 
= [3, 7, 10, 4, 8] 


Based on this scoring system, the ranking of the teams would be 
C first, E second, B third, D fourth, A fifth. The reader may want to 
experiment with other ranking systems. be) 
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The fact that matrix multiplication can be used to answer questions 
about paths in graphs is quite unexpected. But it helps prove the point that 
a very large number of diverse mathematical problems can be analyzed with 
matrices and linear models. This example also shows how useful it can be 
to interpret the meaning of mathematical operations, such as matrix multi- 
plication, in terms of the system being modeled. 

Examples 3 and 4 involve mathematical schemes to encode information 
in a fashion designed to detect, and if possible correct, errors introduced by 
noise when messages are transmitted. A binary code is a scheme for en- 
coding a letter or number as a binary sequence of 0’s and 1's and then 
decoding the binary sequence back into a letter or number. The binary se- 
quence often is transmitted over a communications channel with random 
noise that may change one of the digits in the binary sequence. That is, 
when a | is sent, a O may be received; or vice versa. We assume that the 
chance of two errors in one binary sequence is small enough that it can be 
ignored. 

The examples about binary codes involve multiplication and addition 
mod 2. Because computers represent numbers in terms of 0’s and 1’s (a 
circuit is open or closed), it is very easy for computers to calculate mod 2. 


The following tables summarize the rules for addition and multiplication 
mod 2. 


The sum of many |’s is 0 if there are an even number of 1's, and is | if 
there are an odd number of 1’s. For example, the following scalar product 
is calculated in arithmetic mod 2: 


iT a ee tb ee Oy. bo 1 

= 1xX1l+1x0+0xI+ 1x1+ 1x! 
=1+0+0+1+ 1 #£=(mod2) 

= 1 (mod 2) 


Example 3 Parity-Bit Code for Error Detection 


Error-detecting binary codes are designed to make it possible to detect 
an error should one occur during transmission. Then the transmitter 
would be asked to send the binary sequence again. A standard error- 
detecting code used to transmit data over telephone lines between com- 
puters is a parity-bit code. 

The basic unit in such communication is usually a byte, an 8-bit 
binary sequence. Let b = [b,, b,, . . . , bg] be the byte to be sent. In 
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a parity-bit code, an additional 9th bit p is added to b to get the 
sequence to be transmitted, c = [b,, b,, .. . , bg, p). This bit p is 
normally chosen so that the number of 1’s in c is even. Another way 
to say this is: Pick p so that the sum of the bits inc equals 0 mod 2: 


b, + bo +++: +b, + p=0 (mod 2) 
or equivalently, 
l-c =O mod2 


For example, if the byte to be sent is b = [1, 0, 1, 0, 0, 1, 0, OJ, 
which has an odd number of |’s, then p = 1| and we sendec = [1, 0, 
1, 0, 0, 1, 0, O, 1]. Suppose that the message we received was c’ = 
[1, 0, 0, O, O, 1, 0, O, 1]—the third bit was erroneously changed to 
0. 

Whenever a message ce’ is received, we compute 1: c’. If no 
errors had occurred and c’ = ¢, then we would find that 1 +c’ = 
O mod 2. On the other hand, if 1+c' = 1, as in this example, then 
some digit was altered—and we ask the sender to retransmit. 


A simple way to compute the proper value for the parity bit p is 
to let 


p=2b,=1+b (mod 2) - (4) 


That is, let p equal the sum mod 2 of the 1’s in b. If p = 1° b = 
1 mod 2, b has an odd number of 1’s and making p | will give ¢ an 
even number of |’s. If p = 1->b = 0 mod 2, then b already has an 
even number of |’s and we want p to be 0. ” 


Example 4. Hamming Code for Error Correction 


More advanced error-correcting codes can actually correct an error 
and reconstruct the original binary sequence that was sent. The follow- 
ing scheme due to Hamming takes a 4-bit binary sequence and encodes 
it as a 7-bit sequence. Let b = [b,, b,, b3, b,] be the binary message, 
let p,, P2, P3 be the parity-check bits, and let ec = [c,, c5, . . ., cz] be 
the code sequence that will be transmitted. The parity-check bits are 
chosen to satisfy the following three parity checks: 


Pi + Db, + Bo + b, = 0 (mod 2) 
D> +B, + b, + b, = 0 (mod 2) (5) 
P3 + b, + b, + b, = 0 (mod 2) 


Let us encode these message and parity-check bits in the code sequence 
c as follows: 


Sec. 2.3 0-1 Matrices 


Ci = Py» C, = Pr + Cz, = d,, Cy = P3; 
cs = b, C. = D3, c,=b, 


(6) 


The reason why c, = b,, not p;, will be clear shortly. Now (5) is 


C | + C, + Cs +c, = 0 (mod 2) 
C, + 63 +c, + cz = 0 (mod 2) (7) 
C4 i Cs + Cé = c = 0 (mod 2) 


or 
Mc = 0 (mod 2) 


where 0 is a vector of all 0's and M is the matrix of coefficients in 


(7): 


(3) 


A i 
M=/0 1 
Big l 


Oe ££ 4% 
2 0 3 
oF A 

Each of the c,’s is involved in a different subset of parity checks, 
so when (exactly) one c; is altered in transmission, the parity checks 
allow us to determine which bit was changed. 

Since each of the parity bits c, (= p,), cz (= po), Cy (= Ps) is 
in just one of the parity equations in (7), each can be determined as 
the sum of the other bits in their equation (just as p = & b; in Example 
2). Summarizing how we go from the message vector b to the code 
vector ¢c, we obtain: 


c, = 6, + b, + dy, c, = b, + bs + bg, cz, = b,, 
Cc, = b, + bb, + by, cs = by, C6 = bs, (9) 
Cc. = b, 


The following matrix-vector product (mod 2) does the encoding 
specified by (9). 


Se a 
7! ay ae 
| 0 6 0 
c = Qb, whereQ =} 0 1 1 1 (10) 
yf ey 
oe ow 
eo v1 


For example, suppose that b = [1, 0, 1, 0]. Then 
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RG! 2 

OE: ae ae | 

i Oo 0 Q 0 
e=Qb= /0 11 1 

O° .F° (GG 

7 Oo" 7 : 

eo Oe 


Ix1+1x0+0x1+ 1x0 
1X1 + OxX0O + 1X1 + 1X0 
Ix1 + 0x0 + 0x1 +0x0 

=) O70 - 156 + PX) + Eee) = 
Oxl1 + 1x0+ 0x1 + 0x0 
Ox1 +0x0+ 1X1 + 0x0 
Ox1+0x0+0xX1+ 1x0 


O- OF KF Ot 


(11) 


Suppose that the transmission received was c' = {1, 0, 1, 1, 0, 
0, O]; the sixth bit was changed from 1 to 0. We compute the vec- 
tor e: 


0 
Pe" 3 OO. OO: Sv 0 
e=Me=]/0 1 10 0 41,4 1}; =] 1 (12) 
O DOs Iwek Bako l 
0 
0 


Note that Me’ is just the set of left sides in (7) with ¢ replaced by c’. 
If no error had occurred and ec’ = c, thene = Mc = 0, as in (7). If 
e ~ OQ, as in (12), an error must have occurred. Depending on which 
parity equations are now violated, we can figure out which bit in c’ 
was changed in transmission. We claim that e (= Mc’) equals the 
column of M corresponding to the bit of ¢ that was changed. This is 
the case in (12), where e equals the sixth column of M. The reason is 
that when the kth bit is altered, exactly those equations involving the 
kth bit (1.e., those rows of M with a 1 in the Ath column) will now 
equal | (mod 2). 

As the reader has probably noticed, for each i, the ith column of 
M is simply the binary representation of the number i. Thus the vector 
e “‘spells out’’ the location of the bit that was changed. To get the 
correct transmission, we simply change back the bit in the position 
spelled out by e. In this instance we would change the sixth bit from 
a 0 back toa 1. & 


Sec. 2.3 0-1 Matrices 107 


Optional 


Another useful matrix associated with a graph is the incidence matrix M(G). 


Incidence Matrix M(G) of a Graph G. M(G) has a row for each node 
of G and a column for each edge of G. If the jth edge is incident to 
the ith node (1.e., the ith node is an endpoint of the jth edge), then 
entry m, = 1; entry m, = 2 if there is a loop edge at i; and otherwise, 


Peo B Gam iG 
She Cet E jb ik *y 
(13) 
MiG»=clO | 8 Tf 8 & Q 
EAT By Mt + MPS IE: alta 
al a pe! a | he eae 


M(G) will always have exactly two 1’s in each column (or one 2), since an 
edge has two endpoints. 

The next example shows how, using M(G), one can recast a graph 
optimization problem as a linear program. Although the reformulation is not 
hard to follow, it required considerable ingenuity to think it up. 


PRE EMER 
Example 5. Finding a Maximum Independent Set 


An independent set I of nodes in a graph is a set of nodes with no 
linking edges. For example, {a, e} is an independent set of nodes for 
the graph in Figure 2.2. Independent sets arise in various settings. For 
example, let G be a graph in which each node stands for a letter that 
can be transmitted over a noisy communications channel and an edge 
joins two nodes if the corresponding letters can be confused when 
transmitted (i.e., one letter is sent but another letter is received). An 
independent set would represent a set of letters that cannot be confused 
with one another. 

We shall now recast the property of being an independent set in 
terms of a set of linear inequalities. We assume that G has n nodes. 
The key step is to represent a set of nodes by a membership vector. 
The membership vector x = [x,, X>, .. ., X,] for a set J is defined to 
have x; = | if the ith node is in the set / and x, = 0 otherwise. 

We claim that x is the membership vector for an independent set 
if and only if the following inequality holds. 


xM(G) = 1 (14) 
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Recall that putting the vector before the matrix means that we 
are forming the scalar product of x with each column of M(G); the 
columns of M(G) correspond to the edges of G. For the graph in Figure 
2.2, (14) becomes 


Edge e;: 5 ae a a a | 
Edge e,: x 5 bE. = | 
Edge e,: ee +x, = 1 
Edge e,: 5 ides wae = | (15) 
Edge es: X;, TX, = | 
Edge e;: ci mee 2 = | 
Edge e;: 2X, | 


Recall that each column of M(G) has two 1's, which correspond to 
the pair of nodes that form that edge. Then the left side of the first 
inequality in (15) says that nodes a and 6 are the endpoints of edge 
e,. The condition that x, + x, <= 1 says that not both a and b can be 
in the independent set / (i.e., not both x, = 1 and x, = 1). 

We are ready to pose the problem of finding a maximum inde- 
pendent set of G as a linear program. In the graph model for confusing 
letters sent over a noisy communication channel, a maximum inde- 
pendent set would be the largest possible set of letters that can be sent 
without one being confused with another. 

We must restate the concept of maximizing the size of the inde- 
pendent set in terms of membership vector. What we want is to max- 
imize the number of 1’s in the membership vector. Another way to 
say this is to maximize the sum of the x,’s. But > x, = 1+ x. Combining 
this fact with (14), we have the linear program 


Maximize 1 - x 
subject to xM(G) = 1 
and x, = Oor | 


Because of the integer constraint, such a linear program !s called an 
integer program. There is a large literature about solving integer 
programs. All optimization problems in graph theory can be posed as 
integer problems. 4 


Section 2.3 Exercises 


Summary of Exercises 

Exercises 1—10 deal with adjacency matrices and paths in graphs, Exercises 
8—10 being “‘theoretical.’’ Exercises 11-20 concern coding problems. Ex- 
ercises 21-23 deal with the optional material at the end of the section. 
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Draw the graphs with the following adjacency matrices. 
e BO. Oh & 4 peek. LL @. 6 
Oo F Q Y io 4 i | ee ee a | 
(a) (b) 
Gt “Gg te ee (c) ee OS) US Cae 
1 0 0 0 Pye b Coa 4 
0 T oY. aa 
Ort 2 tae | 
. Write the adjacency matrices for the following graphs. 
(b) G, a b 
d Cc 
(d) G, a b c d 
‘ o—_—_e o——_—_-e 
" (f) Ge @ b c 
f 
ie ra d 
b 


. Compute the square of the adjacency matrix for the following graphs 


from Exercise 2. 

(a) G, (b) G, (c) G, (d) G, (e) Gs (f) Ge 

Use your answer to tell how many paths of length 2 there are in each 
graph between vertex a and vertex d. 


. Compute the cube of the adjacency matrix for the following graphs 


from Exercise 2. 
(a) G, (b) G, (c) G, (d) G, (e) G, (f) G, 


Note: You may use a computer program to do this computation. 
Use your answer here along with that in Exercise 3 to tell if all 
vertices in the graph are joined by a path of length = 3. 


. Use your calculation in Exercise 4 to show that G, and G, are con- 


nected. 


. Direct the edges in the graphs G,, G, in Exercise 2 from the earlier 


node to the later node according to the alphabetical order of the nodes; 
for example, edge (a, c) goes from a to c. 
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10. 


11. 


12. 


13. 


14, 


15. 
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(a) Write the adjacency matrix for graphs G, and G, and compute the 
square of each. 

(b) Give an argument to explain why entry (i, j) in the square A?(D) 
of the adjacency matrix A(D) of a directed graph D tells how many 
directed paths there are in D from j to i. 


. For the directed versions (see Exercise 6) of graphs G,, G, in Exercise 


2, compute the total points for each node according to the method in 
Example 2. 


. For an undirected graph G, show that the result of the matrix-vector 


product A(G)1 (where 1 is a vector of 1’s) is a vector in which the ith 
position tells how many nodes are adjacent to node i. 


. (a) Explain in words why, if entry (i, 7) in A(G)* is the number of 


paths of length 2 between nodes i and j in graph G, entry (i, j) in 
A(G) is the number of paths of length 3 between nodes i and j 
in G. 

(b) (Advanced) Extend the argument in part (a) by induction to show 
that entry (i, ;) in A(G)* is the number of paths of length k between 
nodes i and j in G. 


(a) Suppose that we redefined the adjacency matrix A(G) so that the 
diagonal entries (i, 7) were all 1. Now what is the interpretation of 
entry (i, j/) being positive in A?(G)? In A*(G)? 

(b) Compute A?(G) for this redefined adjacency matrix for graph G, 
in Exercise 2. 


If the following bytes (8-bit binary sequences) are being sent in a parity- 
check code, what should the additional parity bit be (0 or 1)? 
(a) 10101010 (b) 11100110 (c) 00000000 


Suppose that the following 9-bit messages were received in a parity- 
check code. Which messages are altered during transmission? 
(a) 101010101 (b) 101101101 (c) 000000000 


Explain why if two errors occur during transmission, a parity-check 
code of the sort in Example 3 will not detect an error. 


Suppose that in the Hamming code in Example 4, the following 4-bit 
binary messages b = [b,, b5, b;, b,] are to be sent. What will the 
coded 7-bit message c = [c,, C2, C3, C4, C5, Cg, C7] be? 

(a) b = [1, 1, 0, 0] (b) b = [1, 1, 1, 0] (c) [1, 1, 1, 1] 


Suppose that in the Hamming code in Example 4, the following mes- 
sages c’ are received. In each case compute the ‘error vectore = Me’ 
and from it tell which bit, if any, was changed in transmission. 

(a) (0, 0, 1, 1, 1, 1, OJ Pls Tes 4,05 1) 

(c) [0, 1, 0, 0, 0, 0, 0] 
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16. 


17, 


18. 


19, 


20. 
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Assume that at most one error occurred in transmission of the Hamming 
code in Example 4. What message was originally sent if the following 
message was received? 

(a) (0,0, 1, 1, 1, 1, 0] (b) (0, 1, 0, 0, 0, 0, 0, 0} 

(ey (1. 1518.8, TO 


Suppose that we let c, = p,, Co = Po, C3 = Ps, Cy = BD, cg = dy, 
Ce = b3, cz = by, instead of the encoding scheme in (6). With this 
encoding scheme, what is the new Q matrix in (10)? If b = [I, 1, 0, 
0], what would c be? 


(a) Explain why the Hamming code in Example 4 will always detect 
two errors; that is, if two bits in the code are changed, the error 
vector e cannot be all O's. 

(b) Give an example to show that the Hamming code in Example 4 
cannot correct two errors. 

(c) Give an example to show that the Hamming code cannot always 
detect three errors. 


The Hamming code in Example 4 can be extended to a similar code for 
15-bit sequences—11 messages bits and 4 parity-check bits. Write out 
the system of parity-check equations (or equivalently, the matrix for 
coefficients for these equations) for a 15-bit Hamming code. 


Another way to encode a binary sequence is by treating the sequence 
as the coefficients of a polynomial p(x); for example, .the sequence 
[1, 0, 1, 1] yields the polynomial p(x) = 1 + Ox + Ix* + Lx’. We en- 
code by multiplying p(x) by some other polynomial g(x) to get the poly- 
nomial p*(x) = g(x)p(x), whose coefficients we transmit. For example, 
let g(x) = 1 + x. Then for p(x) = 1 + Ox + Ix? + 1x’, we com- 
pute p*(x) = g(x)p(x) = (1 + x)(l + Ox + Ix? + Ix’) = 1 + 
Ix + Ix? + Ox? + 1x*. (Remember that arithmetic is mod 2.) So we 
transmit [1, 1, 1, 0, 1]. To decode, we divide the polynomial p*(x) by 
g(x). If any error occurred in transmission, there will be a reminder— 
this tells us that an error occurred. 
(a) Using g(x) = 1 + x, perform a polynomial encoding of the se- 
quence [1, 1, 0, I]. 
(b) Suppose that the following messages are received, based on this 
polynomial encoding with g(x) = 1 + x. Which ones have errors? 
(i) [1, 1, O, 1, 0] (ii) [1, O, 1, O] (iii) [1, 0, 0, O] 
(c) (Advanced) Show that the parity of messages transmitted is always 
even with the polynomial encoding scheme with g(x) = | + x. 


Hint: By setting x = | in p*(x) [= g(x)p(x)], one can sum the 
coefficients (i.e., the message bits). 


Exercises for Optional Material 


21. 


Write the incidence matrix M(G) for the following graphs in Exer- 
cise 2. 


(a) G, (b) G, (c) G. 
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22. Write out the linear program for finding a maximum independent set in 
the following graphs in Exercise 2. 
(a) G, (b)G, (©) G; 


23. Show that 1- M(G) = 21 = [2, 2, 2,..., 2] for any graph. 


“Section 2.4 Matrix Algebra 


In this section we introduce the algebra of matrices. Algebraic techniques 
are used in high school to manipulate, simplify, and solve equations, such 
as rewriting 2x — 2y = 4as y = x — 2. Similar methods exist for matrices. 

In Sections 2.2 and 2.3 we used matrix notation to express systems of 
linear equations. For example, in Example 2 of Section 2.2 we wrote the 
refinery equations 


20x, + 4x, + 4x, = 500 
10x, + 14x, + Sx, = 850 (1) 
5x, + 5x, + 12x, = 1000 


as 
20 4 4 x, 20x, + 4x, + 4x, 500 
10 14 5 ; X» — 10x, + lax, + 5X3 = 850 
ey a ay X5 o%, + Dito + 2X, 1000 


or, In matrix notation, 
Ax =b (2) 


where A is the 3-by-3 matrix of coetficiems, b is the right-side vector, and 
x is the vector of unknowns. 

Just as the matrix notation of Ax = b gives a concise way to write 
the system of equations in (1), so matrix algebra provides a concise, powerful 
way to manipulate and solve matrix equations. As one would expect, the 
rules of matrix algebra are basically extensions of single-variable high school 
algebra. We start with some examples that illustrate the power of matrix 
algebra. 

First we need to define the ones vector 1 and identity matrix I. The 
vector 1 is simply a vector of all 1's: 


When we write a scalar product such as 1 - b, we assume that the ones 
vector 1 has the same length as b (so that the product makes sense). As 


noted in Section 2.3, the product 1- b is simply the sum of the elements in 
the vector b = [b,, b,,... , b,). 
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l-b=b,+b,+°-:+b, (3) 


The identity matrix I, always a square matrix, has 1|’s on the main 
diagonal and zeros elsewhere. 


1 0 0 0 

Oo ft 0 0 

Oo as 0 
l= 

0. a l 


fi ® 0 b, 
ve. +g 0 b, 
Gre. J ) b 

Ib = | ; 
oO O-@ =227 4 b,, 
1xb, + OXb, + OXb, + +++ + OXDS, b, 
Oxb, + 1Xb, + OXb, + +++ + OX, b, 

_ | Oxo, + OX, + XO, tes? + OX, TBs + 

Oxd, + OxXb, + OXd, +++* + 1XB, b,, 

In a similar way, one can verify that bl = b. So we have 

Ib = b = bi (4) 


This is why I is called the identity matrix. As with the ones vector 1, we 
assume that the size of I equals the length of b. 
Equation (4) extends to matrices. That is, 


IB = B = BI (5) 


for any matrix B. Note that if B is an m-by-r matrix, then the I on the 
left side of B must by m-by-m, while the I on the right side of B must be 
r-by-r. 

We can use matrix algebra to verify (5). The columns of the matrix © 
product IB equal the matrix-vector products of I with each column bf of 
B. For example, the first column in IB is the matrix-vector product Ib{, 
which by (4) equals bf. In matrix notation we write 
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IB = I[{b¢, bf, bS,.. . , b¢] = [Ibf, Ib§, 1b§,..., IBS] 6) 
= [b,, B,, b3,..., D,) = B 


Example I. Matrix Algebra in the Leontief 
Economic Model 


In Example 4 of Section 2.2 we used matrix notation to represent the 
system of equations in the Leontief economic model 


Industrial Demands 


ist 
Supply Energy Constr. Transp. Steel © Demand 
Energy; X;= 4x, + 2%. 4+ .2x5 + .2%,+ 100 
Construct: 35. 2%) * BOE + .2%. * 1K, + 50 (7) 
Transport. x%,= .Ilx, + In + + 2%, + 100 
Steel: x,= + ..i%>. 4 hk + 0 
as 
x= Dx+ec (8) 


where D is the matrix of coefficients for interindustry demands, c = 
[100, 50, 100, 0] is the vector of consumer demand, and x is the vector 
of (unknown) production levels +;. | 

The standard way to write a system of linear equations is with 
the x, all on the left side and constants on the right. If we bring all the 
x, over to the left side, (7) becomes 


6x, — 2% = 2X3 — PP —_ 100 


=, 3X) tke = .2k— = xy = 30 (9) 
NX aan 1x, a a aXe = 100 
= 1 ae Ft y= OM 


We can also shift the x, to the left side in matrix notation. We 
rewrite (8) 


x= Dx+e- x- Dx=c (10) 


Recall from (4) that x = Ix (I is the identity matrix). Using this fact, 
we can rewrite (10) 


x —- Dx =c — Ix - Dx =c (11) 
— (I — D)x 


| 
° 


Writing | — D out, we have 
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Oe id camer See Ais 
PQ We PA we ake 
i & Fo rae es ae 
GO Go if J dl GG a, 
6 -—-2 -2 -.2 
8 Selo =A 
a, ey ee | l =—2 (12) 
0 =f -.] | 


The resulting matrix in (12) is the coefficient matrix of (9). 

In Section |.2 we stated Leontief’s input constraint that every 
industry be profitable; that is, making | dollar’s worth of the ith com- 
modity should cost less than | dollar. Recall that the input costs of 
energy are the coefficients in the Energy column of the demand matrix 
D; similarly for other commodities. So Leontief’s constraint is that 
each column sum in D should be less than 1. 

We now develop a compact matrix inequality that expresses this 
constraint on D. Using the ones vector 1, we can write Leontief’s 
constraint (where d*> is the jth column of D): 


Input constraint: A-d& =d, + dy + ds, + dy<1 (13) 


[This use of 1 in summing was discussed in (3)]. Combining (13) for 
all columns, we have 


[1- do, 1-dS,1-dS,1-d°] <[1,1,1,1]) =1 


The vector inequality < means term-by-term inequality. Factoring 1 
out in front, as with one-variable expressions, we have 


Id, dS, dS, dS] <1 
or 
iD <1 (14) 


This is the compact mathematical way to say that all column sums are 

less than | in the interindustry matrix D. In Section 3.4 we shall use 

1D < 1 to prove that all Leontief economic models have a solution. 
a 


REET 
Example 2. Matrix Algebra in Markov Chains 


We showed in Section 2.2 that the transition equations for a Markov 
chain can be written in matrix notation as 
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% 
Jails =e (15) 
== we Eee 


p; = a;p (16) 


where A is the Markov transition matrix, p is the vector of the current 
probability distribution, and p’ is the vector of the next-period prob- 


ability distribution. For the frog Markov chain, the system of equations 
p = Apis 


p,; = .50p, + .25p, Chee 
p> = .5Op, + .SOp, + .25p, at og 
Pp; = .25p> + .5Op, + .25p, 

Pp, = ‘250, + SOD, + Zaps 

ps = 29p, + .SOps + 50D, 

Pe = .25p; + .5SOp, 


The next-period calculations represented by (15) can be repeated to 
find the distribution p” after two periods. Using matrix algebra, we 


(Ap) = A’p 


and after n periods, the djstribution vector p is 


p” = A"p 


The current probabilities p;, the entries in p, must sum to 1. We 
can express this fact with the ones vector 1 as 


Sp ph, Pye ts Ee (17) 
The entries in the columns af in A must also sum to 1. 
Lal = 4a, + a, + *> "+a, = 1 (18) 
Combining all the columns together into A, we see that (18) yields 


tA. = [0 af, 108,502 a8 (19) 
Potts aieqbl A 


atrix algebra allows us first to represent the column sum being 
| Zancisely and then also allows us to state the fact for all columns at 


oncedas 1A = 1. 

Equations p’ = Ap, 1: p = 1, and 1A = 1 can be used to 
show that in the next-period distribution vector p’, the entries p; also 
sum to 1. That is, we want to prove 
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I-p’ 21: 1:-p' = 1- (Ap) since p’ = Ap 
= (1A):p=1-p since 1A = 1 (20) 
= | since l-p = | 


This argument can be repeated to show that the entries sum to | in p", 
the distribution vector in two periods, and more generally in p™. 


Transpose of a Matrix and 
Symmetric Matrices 


The operation of transposing a matrix has many theoretical and practical 
uses. In this book, its primary use is in computing pseudoinverses (in Section 
33). 

The transpose of a matrix A, written A’, is the matrix obtained from 
A by interchanging rows and columns. Another way to think of it is, flipping 
A’s entries around the main diagonal. For example, if 


Leas 
1 4 7 
A=1]4 5], then A’ = 
= E 5 i 
Kone 
Transposes have the following properties: 
A’ + B’ = (A + BY’ (21) 
(AB)’ = B’A’ and (Ab)’ = b’A? (22) 
(A7)’ =A (23) 


The order of multiplying A and B is reversed on the left side of (22) because 
transposing reverses the roles of rows and columns: If A is m-by-r and B is 
r-by-n, then A’ is r-by-m and B’ is n-by-r. We use the notation b’ in the 
second part of (22) to emphasize the change of b’s role from a column vector 
to a row vector. 


Proof of (AB)? = B’A’. We must show for any i, j, that entry (i, /) 
in (AB)’ equals entry (i, /) in B’A’. Entry (i, j) in B’A’ equals the 
scalar product of the ith row of B’ (= ith column of B) times the jth 
column of A’ (= jth row of A). So we have 

entry (i, j) of B’A’ = bf - a¥ 
and 


entry (i, j) in (AB)’ = entry (j, i) in AB = af - b¢ 


Since bf - a = af - bf, the identity is proved. a 
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One of the most useful properties a matrix may have is symmetry. A 
matrix A is symmetric if A = A’. The adjacency matrix A(G) of a graph 
G, introduced in Section 2.3, is a symmetric matrix. Of course, a symmetric 
matrix must be a square matrix. Symmetric matrices have many nice theo- 
retical and computational properties. If A is a symmetric matrix, all the 
information in the matrix is contained on or above the main diagonal. 

A familiar example of a symmetric matrix is a mileage chart on a road 
map. Because of its symmetric structure, only the upper (or lower) triangular 
portion of this matrix is usually given. Symmetric matrices are very common 
in physical sciences applications. 

There is a useful symmetric matrix associated with any unsymmetric 
matrix, the matrix A’A. Entry (i, j) in A’A will be the scalar product 
ay: ay of the ith and jth columns of A. Since af: af = af: af, entry 
(i, 7) and entry (j, i) in A’A are the same. So A’A will be symmetric. (One 
computes scalar products of pairs of rows in the related symmetric matrix 
AA’.) 

There are many problems where one wants to measure in some infor- 
mal way how similar various pairs of columns are in a matrix. Scalar prod- 
ucts of the columns, as computed in A’A, provide one good measure. 


SOE 
Nes 


aaah ta Se Stem mane ge 
xample 3. Scalar Products of Columns as a 
Similarity Measure 


Suppose that five students A, B, C, D, E have been asked to rate six 
subjects—linguistics, mathematics, necromancy, optometry, philoso- 
phy, and quantum mechanics—as subjects they like (rating = 1) or 


as subjects they do not like (rating = — 1). The following rating matrix 
R was obtained. 


I 

] 
_ Necr } -] | —|] =i] (24) 
€ | 

l 


To measure the similarity of interests among students, we want to use 
the scalar product of pairs of columns. Observe that the scalar product 
will be positive if two students’ ratings tend to agree and will be 
negative if they tend to disagree. To get these scalar products, we 
simply compute R‘R. Since R’R is symmetric, we only need to com- 
pute the entries on or above the main diagonal. 

This computation yields 


Sec. 2.4 Matrix Algebra 119 


A 0 Z. =4 4 
B 6 -4 2 2 
R’R=C 6 0 0 29) 
D 6 .=2 
E 6 


Computing RR’ would yield scalar products of pair of rows in 
R. These products would measure how similar different pairs of sub- 
jects are perceived to be; that is, a large positive number would mean 
that most students give the same rating to the two subjects. & 


We close our discussion of symmetric matrices with a very special, 
and simple, type of symmetric matrix. A diagonal matrix is a matrix with 
nonzero entries only on the main diagonal. The identity matrix is a diagonal 
matrix. If one premultiplies a matrix A by a diagonal matrix D—DA—the 
result just multiplies the ith row of A by the ith diagonal element in D: 


2 Od 2) JoaxechtOxs° Se2OxXA) fad 10 
OSs al Poxt + 8xo° ON oxat 14 32 
(26) 


Postmultiplying by a diagonal matrix has a similar effect on the columns 
(see the Exercises). 


Rules of Matrix Algebra 


In Example 2 the step I - (Ap) = (1A) - p [going from line | to line 2 in 
(20)] was not justified. Similarly, in Example | we wrote Ix = Dx = 
(I — D)x without explanation. These are common algebraic manipulations 
for single-variable equations, the associative and the distributive laws, re- 
spectively. We were implicitly assuming that these laws are also valid in 
matrix algebra. 

The rest of this section is devoted to a quick summary of what algebraic 
manipulations are and are not valid for matrices. In the following we assume 
that our vectors and matrices have the right sizes (so that operations make 
sense). 

Since all the basic rules are stated in terms of equations, their proofs 
consist in showing that the (i, /) entry of the matrix on the left side equals 
the (i, 7) entry in the matrix on the right side. One such proof is worked out 
to show both the technique of proving matrix equality and the power of the 
notation we have developed. The other proofs are left to the Exercises. 


Commutative Law. Matrix addition is commutative, but matrix multipli- 
cation is not commutative. 


A+B=B++A but AB # BA (27) 
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Distributive Law 
A(B + C) = AB + AC and (B+ C)A = BA + CA (28) 


Proof of A(B + C) = AB + AC. We must show that the (i, 7) entry 
of the product matrix A(B + C) equals the (i, j) entry of the sum of 
products AB + AC. This entry in A(B + C) is the scalar product of 
the ith row of A times the jth column of B + C (which is bf + 
c’)—a¥ - (b& + cf). Now we must write out this symbolic scalar 
product term by term and use the distributive law for scalars (this gets 
a bit messy). 


aX(be + cf) 


2 (Dy; as Cy) = p» did, + 2 Dip Oxy 


= afb> + afer (28a) 

and aibf + afcf is exactly the (i, j) entry in AB + AC. cal 
Associative Law 

(AB)C = A(BC) (29) 


There is one new property for matrices, scalar factoring. If r is a scalar 
(a single number), then 


Scalar Factoring 


ee r(AB) = (rA)B = A(rB) (30) 


A vector is just a matrix with one row or one column. The rules for 
matrix multiplication thus apply to matrix-vector multiplication. For com- 
pleteness, we restate them. 


(AB)c = A(Be) and a(BC) = (aB)C (31) 
A(b + c) = Ab + Ac and (c+ dA =cA+ dA = (32) 
a(B + C) = aB + aC and (C + Dia = Ca + Da _ (33) 
(rA)b = A(rb) = r(Ab) and (rb)A = b(rA) = r(bA) (34) 


In this book we have not made a major distinction between a vector x 
being a column vector or being a row vector. However, in complex products 
involving matrices and vectors, it is essential to treat each vector as an 
n-by-1 or a 1-by-n matrix (whichever is appropriate) and then treat the result 
of a matrix-vector product, such as Ax (where A is m-by-n), as an m-by-1 
matrix. For example, the following equality is false: 


(Ab)C # A(bC) (35) 


since Ab yields a column vector and C should be premultiplied by a row 
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vector, and similarly for bC and A. For the same reason 
(Ab) - (Cd) + (A(bC)) + d (36) 


See the Exercises for specific counterexamples of (35) and (36). 
On the other hand, the following equality is valid: 


(aB)-c = a>: (Be) (37) 
because no vector changes roles from a column to a row vector. 
Near the beginning of this section, we introduced the vector 1 of all 
1’s and the identity matrix I. We noted 
2 ea tS Gok 424, (38) 
and 


al = a = fa and IA = A= Al (39) 


A related vector and matrix is the 0’s vector 0 of all 0’s and the O 
matrix of all O’s. It is immediate that 


O-a=0 Oa = 0 OA =O (40) 


There is another special vector that will be used in this book. The ith 
unit vector, denoted e,, is the ith column in the identity matrix I. Vector e, 
has a | in the ith entry and 0’s elsewhere. This vector has the following 
useful property: 


Ae, = a® (the ith column of A) (41) 
eA = a® (the ith row of A) 


Vector sums and products (scalar products) also behave nicely. In fact, 
scalar products are commutative as well as distributive and have scalar fac- 


toring. 
a+b=b+t+a (42) 
a:-b=pb-:a (43) 
a-‘(b+ec)=a'‘btarce (44) 
r(a* b) = (ra): b = a: (rb) (45) 


Note that in the distributive law (44), addition is vector addition on the left 
and scalar addition on the right. 

There is no associative law for scalar products, since the expression 
(a - b) « ¢ is nonsense: (a+ b) is a scalar, not a vector. There is a related 
expression that looks reasonable but is not valid. 


(a: be ¥ a(b~< c) (46) 
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To sum up matters thus far, it is easiest to say what is not true for matrices. 
The rules that fail are 


. AB # BA. 
. Throughout an algebraic manipulation, a vector must always be 


treated as a 1-by-n matrix (or always as an n-by-1 matrix)—it can- 
not change from one form to the other. 
. (a: bye ¥ a(b> ce). 


Section 2.4 Exercises 


Summary of Exercises 

Exercises |—16 develop skill with simple matrix algebra manipulations. Ex- 
ercises 17—23 deal with transposes and symmetric matrices. Exercises 24—34 
involve verifying rules of matrix algebra. 


1. Evaluate the following products involving the 1's vector 1, the identity 
matrix I, and the ith unit vectore, = [0,0,...,1...,0,0]. Assume 
that all have size n. 


(a) I? (b) 1-1 (c) I (d) Ie, (e)1l-e (fee 
(ye-e GAs) (HIM (i ele @ ~J) 


2. (a) Write the following system of equations in matrix form. 


3x, + de + 7%, = 8 
2x, — X, + x, =4 
6 


x, + Gx. — 2%; 


(b) Rewrite the matrix equation in part (a) to reflect the operation of 
bringing the right side over to the left side (so that the right sides 
are now Q’s). 


3. (a) Write the following system of equations in matrix form. 


2X, — 3X, = x, 


Sx, + 4x, = X 


(b) Rewrite the matrix equation in part (a) to reflect the operation of 
bringing the right-side variables over to the left side. Your new 
equation should be of the form Qx = 0, where Q is a matrix 
expression involving I, the identity matrix. 


Hint: See Example 1. 


4. Repeat Exercise 3, parts (a) and (b) for the following system of equa- 
tions. 
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3x, + 2x, = 2x, 
4x, a, 3X5 —= 2X5 
5. Consider the system of equations 
2k 3x, = 2h = Sy, + 2yz. = Sey + 20 


x, + 4x, + 3x; = — 4y, + 4y, — 120 
SX + 2X >= X3 = 2y; -— 2y3 + 350 


| 
a 
= 


(a) Write this system of equations in matrix form. (Define the vectors 
and matrices you introduce.) 

(b) Rewrite in matrix form with all the variables on the left side and 
just numbers on the right. 

(c) Rewrite in matrix form so that x, is the only term on the left in the 
first equation, x, is the only term on the left in the second equation, 
and x; is the only term on the left in the third equation (similar to 
the form of the Leontief economic model). 


6. (a) Consider the following system of equations for the growth of rabbits 
and foxes from year to year: 


R' 
F' 


LOR <a 200 
on + SE ESD 


Write this system in matrix form, where p = [R, F] and p’ = 
[R’, F’]. 

(b) Write a matrix equation for p”, the vector of rabbits and foxes after 
2 years. 

(c) Write a matrix equation for p®, the vector of rabbits and foxes 
after 3 years. 


(d) Using summation notation (%), write a matrix equation for p™, the 
vector of rabbits and foxes after n years. 


7. (a) Write the rabbit—fox equations 


R= R+ JR — SF 
F' = F + .2R — .3F 


in matrix form using p = [R, F], p’ = [R’, F’], and A = 


b - i 

2 a3 

(b) Rewrite the equation from part (a) in the form p’ = Qp, where Q 
is some matrix expression: 
Hint: Use the identity matrix [. 


(c) Let p® be the vector of population sizes 20 periods later. Express 
p” in terms of p, A, and I. 
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10. 


11. 


12. 


13. 


14. 


$5. 
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Show that if x° is a solution to Ax = b, then rx® is a solution to 


Ax = rb. 


Show that if x° is a solution to Ax = b and if x* is a solution to 
Ax = 0, then x® + x* is also a solution to Ax = b. 


Let A and B be 2-by-2 matrices and x, y, z be 2-vectors such that 
Ax = By = [1, 1], Ay = [1, 0], Bx = [0, 1]. Determine z when 


(a) z = A(2x — y) (b) z = (A — B)x 

(c) z = (A + B)x — 2A + B)y (d) z = (3A + BYx + y) 
(e) z = [((A + B)y]: [(A + 3B)(x — y)]1 

Given a linear model of the form x’ = Ax + b, let us expand the | 


n-by-n matrix A into an (mn + 1)-by-(n + 1) matrix A* by including 
b, row vector 0 of 0’s and a | in entry (n + 1, + 1) so that A* has 


A b 
eee 
the form A 3 ‘ 


We should also add to x an (n + 1)st entry equal to 1; call the 

new vector x*, and now our linear model has the form x* = A*x*. 
Give the new A* for the following linear models. 

(a) x, = 3x; + 2x, + 10 (b) x, = x, + 2 + 5x, + 20 

X> = 4x, — Sx, + 8 X» = 2%, — Io 2k = 10 

x, = 3x, + 4x, + 6x, + 30 


(c) Leontief model in Example 1. 


Show that if the second row of A is all 0’s, the second row in the 
product AB (if defined) is all 0’s. 


Show that if A is the transition matrix of a Markov chain with five 
states, LAI = 5. 


Hint: Write LAl = (1A)1 and use the result in Example 2. 


(a) Extend the reasoning in Example 2 to show that 1+ p” = 1, p” is 
the distribution after two periods. 
Hint: Write p” as A(Ap) and use the steps in equations (20) twice. 
(b) Prove by induction that the sum of the probabilities in p, the 
distribution after 7 periods, equals 1. 


(a) State the fact that for a Markov transition matrix A, the column 
sums in A* equal 1 with a matrix equation involving 1. 
Hint: See equation (19). 

(b) Use equation (19) to prove the equation you wrote in part (a). 

(c) Prove that the column sums in A® equal | using matrix algebra 
{follow the reasoning in parts (a) and (b)). 

(d) Use induction to prove column sums in A” equal |. 
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16. 


17. 


18. 


19, 


20. 


21. 


22. 


23. 


Let A and B be n-by-n matrices. If C = AB and the second row of A 
is five times the first row of A(af = Sa‘), show that the second row 
of C is five times the first row of C. 


Hint: Compare c,, = a*-b© with c,, = af + bf, similarly for c,, 
Versus C>>, and so on. 


Verify (Ab)’ = b’A’ for b = [b,, b,] and A = a = 


G5, 2 


Why are the entries all equal to 6 on the main diagonal of R’R in 
equation (25)? 


Compute RR’ in Example 3 to find a measure of how much different 
students share common views of their subjects. 


The faculties in the four divisions of the College of Arts and Sciences 
at Wayward University (Natural Science/Mathematics, Biological Sci- 
ence, Arts & Humanities, Social Science) have taken stands for or 
against the following five issues: 


NS/M Bio A&H SS 


(a) Wayward needs to change its name No Les: Yes. “kes 
(b) Wayward has a friendly campus Yes Yes No No 
(c) CompSci 112 is too hard No No Yes Yes 


(d) The Alfred E. Neuman dorm is ugly No Yes Yes No 
(e) Wayward athletes should be better No No No Yes 


Compute a matrix of similarities between the divisions (remember that, 


by symmetry, you only have to compute the entries on or above the 
main diagonal). 


Let A(G) denote the adjacency matrix of the graph in Figure 2.2 and 
let M(G) denote the incidence matrix of that graph [see equation (13) 


-of Section 2.3]. Show that entry (i, 7) of M(G)M(G)’ equals entry 


(i, j) of A(G), for i ~ j. Explain in words why this result is always 
true. 


(a) Compute AD for the matrices A and D given in the discussion of 
diagonal matrices. 

(b) Show that if D is a diagonal matrix, AD has the effect of multi- 
plying the ith column of A by the ith diagonal entry of D. 3 


Show that if A is symmetric, then A* is symmetric. 
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24. 


27. 


29. 


31. 


32. 


33. 


34. 
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Prove that (B + C)A = BA + CA by mimicking the argument in the 
text used to show A(B + C) = AB + AC. 


. (a) Verify (AB)c = A(Be) for A and B arbitrary 2-by-2 matrices and 


c = [c,, Cc]. 
(b) Extend the result in part (a) to show (AB)C = A(BC) using the 
reasoning in equation (6). 


. Show that the identity A(b- c) 2 (Ab) - ec makes no sense by making 


up a 3-by-2 matrix A, a 2-vector b, and a 2-vector c. Compute the 
value of A(b « ¢) and then try to compute (Ab) - c. 


Show that the identity A(bC) 2 (Ab)C makes no sense by making up 
matrices A, C and vector b with A 3-by-2 so that the matrix expression 
A(bC) makes sense (the sizes fit together properly), and then show that 
the sizes are wrong for (Ab)C. 


. Verify that bl = b for any b = [b,, b,,... , b,). 


Verify that IB 


B for a 3-by-3 matrix 


bi, by Dy 
B=] 52, by dry; 
b3, bs. Ds, 


by performing the matrix multiplication IB. 


. (a) For a given matrix B, let b* denote the ith row of B. Show that 


1 - b* equals the sum of entries in the ith row of B. Show that B1 
yields a vector whose ith position is the sum of the entries in the 
ith row. 

(b) Show that 1B yields a vector whose jth position is the sum of the 
entries in the jth column of B. 

(c) Show that 1B1 equals the sum of all the entries in B. 


Give an example involving three 2-vectors to show that (a+ b)c # 
a(b : c). 


Let e, denote the vector with a | in the ith position and 0’s elsewhere. 
What is the value of 1Ae,? 


(a) Why is the following identity false: (AB)? = A?B?? What is (AB)? 
actually equal to? 
(b) Can you find two nonzero matrices A, B for which (AB)* = A?B*? 


Why is the following identity false: (A + B)? = A2 + 2AB + B?? 
What is (A + B) actually equal to? 
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Section 2.5 Scalar Measures of a Matrix: 


Norms and Eigenvalues 


The goal of this section is to express with a single number the magnifying 
effect a matrix A has when A multiplies a vector, as in p’ = Ap. For 
example, will p’ be about twice the size of p? We want to capture in one 
number the “‘essence’’ of multiplication by A. The first problem is how to 
measure the size of a vector. 

In matrix algebra, the word norm is the name used for the size of a 
quantity. For scalars, the standard norm is the absolute value |a|. The most 
common norm for an n-vector a = [a,, a5, ..., a,] is the euclidean 
distance |a| of the point [a,, a), . . . , a,] from the origin. This distance is 
the square root of the sum of the squares of the a,’s. 


Vaet+aet+->:+a@ (1) 


For example, if a = [—1, 2, 2], then ja] = V1? + 2? + 22 = 
V9 = 3. A set of vectors with easily computed norms are the unit vectors 
e,, which have a | in the ith position and 0 elsewhere. Clearly, formula (1) 
gives |e| = 1. 

Because it uses the euclidean distance, the norm in (1) is called the 
euclidean norm. Although the euclidean norm has a nice geometrical in- 
terpretation, this norm is often tedious to compute. Since there are other 
vector norms that are easier to compute and more natural for our work, the 
euclidean norm has limited value in linear models considered in this book. 
Two natural ways to measure the size of a vector are the sum of its entries 
and its largest entry. Since norms need to be nonnegative, we use absolute 
values in defining the sum and largest-entry norms. 


Sum norm: |al, = > Ja, Maximum norm: _|a|,,, = max {{a,|} 
f 


For example, |[—1, 2, 2]|, = 1 + 2 + 2 = 5 and |[—1, 2, 2]l,,. = 
max {1, 2, 2} = 2. Any probability vector will have sum norm = 1—this 
is what we would expect for such a vector. The unit vectors e; have a value 
of 1 for both these norms, just as they did for the euclidean norm. We now 
write the euclidean norm as |a|, to avoid confusion. 

The norm ||Al| of a matrix A is a bound on the magnifying effect A 
has when it multiplies some vector. We define ||Al] to be the (smallest) bound 
so that 


|Ax| = [Al] - [x] (2) 
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Thus 
|Ax| 
|Al| = max | 7 (3) 
all x [x| 
We assume that x + 0 in (3). 
There is an immediate extension of (2) to powers of A. 
J|A‘x| = |All « |x| (4) 


Since the magnifying effect depends on how the size of the vector is 
measured, for each of our three vector norms we get a corresponding matrix 
norm: a euclidean matrix norm ||Al|., a sum matrix norm ||Al|,, and a max 
matrix norm ||Al|,,.. Each of these three matrix norms has its own special 
properties. 

As with the euclidean norm for vectors, the euclidean norm for matri- 
ces is the most commonly used matrix norm in linear algebra and has the 
best theoretical properties. However, the euclidean norm of a matrix is very 
difficult to calculate, while the sum and max norms are easy to determine. 


Theorem 1. The sum matrix norm ||Al|, equals the largest column sum of 
A (in absolute value), and the max matrix norm ||All,,, equals the largest 
row sum of A. That is, 


(i) ||Al, = max (ay) (ii) JAllnx = max ({a7'|,) 
J i 


The proof of Theorem 1, part (i) is given below [part (ii) is left to the 
Exercises]. First let us illustrate these formulas. 


SRL RIBS 
Example 1. Sum and Max Norms of a Matrix 


Use Theorem | to determine the sum and max norms of A. 
Piva 
Az=1i14 5 6 
OD ek 


The last column has the largest column sum, so the sum matrix norm 
of A is ||Al|, = 3 + 6 + 9 = 18. The last row has the largest row 
sum, so the max matrix norm of A is ||Al|_,. = 7 + 8 + 9 = 24. 

Let us see how these norms bound the magnifying effect of mul- 
tiplying a vector x by A. Since |Ax| = |All + |x|, using the sum and 
max norms, we have 
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We can attain the sum norm bound in (5) using e,; = [0, 0, I], with 
le,|, = 1. The bound for Ae, is 


lAe,|, = |All, - lesl, = 18 x 1 = 18 


and |Ae,|. equals this bound: 


| 

Ae, = | 4 
6 

; (0) 


2 se 3 
> 64101 = 16 sO 
8 911 1 9 
Ae, = 3 +6+9 = 18 
The reader should check that we attain the max norm bound of 
24 with the |’s vector x = 1 = [1, 1, I]. a 


The use of a unit vector and the |’s vector to achieve the norm bounds 
in (5) always works for positive matrices. 


Theorem 2. Let A be a matrix with nonnegative entries. 
(i) Sum norm: If the jth column of A has the largest sum, unit vector 
e, achieves the sum norm bound: |Aej, = ||All,/e),. 
(ii) Max norm: The 1’s vector 1 achieves the max norm bound: 


JA nx = Alls! Ems: 


Proof of Theorems I and 2, part (i): First we note that in the definition 
(3) of the sum norm ||A\| = max (|Ax|,/|x|.), it is sufficient to consider 
only vectors x with |x|, = 1. For if |x|, = &, then y = (1/k)x has 
sum norm | and |Ay|,/ly|, = |Ax|,/|x|,.- 

For concreteness, we work with the matrix A in Example |. We 
want to find an x, |x|, = 1, that maximizes |Ax|, (= |Ax|,/|x|,). Let 
us write the matrix-vector product Ax in the following form: 


L. 2 Ses, l 2 3 
Ax =|]4 5 6]/x%] =x,/4/ +x /5|+ x] 6 (7) 
7 8 9JLx, 7 8 9 


With |x,| + |x| + |x,] = 1, we must pick the x,’s to make the linear 
combination of column vectors in (7) as large as possible. Clearly, (7) 
is maximized when the x, associated with the largest column—column 
3—is | (and the other x,’s are 0). Thus the maximizing x is [0, 0, 1] 
and the sum norm of (7) with this x is |a$|,, the sum of the third 
column’s entries. This reasoning is valid for any matrix. o 


A simple alteration of the 1’s vector is required to achieve the max 
norm when A has negative entries (see Exercise 25). 
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Example 2. Norm Bound on Growth in 
Rabbit-Fox Population Model 


Example 3 of Section 1.3 started with the rabbit-fox growth model 


R' = R+ .2R — .3F (8a) 
F’ = F — .1F + .1R 


R’ 1.2 —.311R 
ia bears |b eo 
Let p = [R, F], p’ = [R’, F’] and A be the matrix of coefficients in 


(8b). The sum norm (largest absolute-value column sum) and max 
norm (largest absolute-value row sum) of A are 


Or 


HAI, = 1.3 Alla = 1-5 (9) 

Using (2), we have |p’| = |Ap| = ||Al| - |p|, or by (9), 
Ip’, = 1-3pl, = and [p'inx = 1-5[Plax =. (10) 
When we started with R = 100, F = 100, we found that the popu- 
lations declined to extinction—we did not get close to the norm bounds. 
When we started with R = 100, F = 50, the populations grew 


initially. Let us get a bound from (10) on this growth. For p = 
[100, 50], we have |p|, = 150. So (10) yields the sum norm bound 


Ip’|, = 1.3|p|, = 1.3 x 150 = 195 
From (8a) we compute R’ = 105, F’ = 55, so |p’|, = 160. Thus for 


p = [100, 50], the sum norm bound ts a decent estimate. 
Bound (4) can be used for the population p™ after k periods: 


IP, = |A“pl, = |ANBIpl, = (1.3) x 150 : 


. 


2 a ee ee oN 
4 sii yr” 


Transition Matrix 


Since the sum norm equals the largest column sum and the entries in 
every column af sum to | in a Markov transition matrix, it follows 
that such a matrix A has sum norm ||A||, = 1. Because powers of a 
transition matrix have column sums of 1, all powers also have a sum 
norm of 1. 


Any probability vector p achieves the sum norm bound: 
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JAp|, = |IAlkipl, or |Apl, = |p|, (since ||Al|, = 1 


since the vectors p and Ap (= p’) both have sum norm 1. a 


aide 4. ae Norm of Demand Matrix in 
Leontief Economic Model 


The Leontief economic model, given in Example 2 of Section 1.2, has 
an input constraint that the (nonnegative) entries in each column d€ of 
the demand matrix D sum to < |. This meant that it cost less than | 
dollar (of inputs) to produce | dollar worth of the jth product. It follows 
immediately that |{Dj|, < 1. 

In Section 3.4 we will see how ||D||, < 1 guarantees that the 
Leontief model always has a solution. a 


The following properties are true for any matrix norm: 


Ir|lx| = |rx| = andr All| = |r| - |All 
||ABi| = |All - {Bi (11) 
|A*| = (All) (12) 


|A + Bl = |All + |B) 


One of the most important uses of norms is to determine error bounds. 


Remtaple:! 5. Use of Matrix Norm in Error Bounds 


Consider the following growth model for the numbers of computers C 
and dogs D in successive years: 


I 


as 
‘D' 


| Om om 5. (13) 
2C + 2D 


The sum norm ||Al|, of the coefficient matrix A is 5 (= the sum of 
coefficients in the first column). If ¢ = [C, D] is the initial numbers 
of computers and dogs and ec’ = [C’, D’], thence’ = Ac. 

Suppose that there is an error in determining c and we mistakenly 
use the initial vector b, where b = c + e—here e is the vector of 
errors. Then the error | year later is 


b’ — c’ 


Ab — Ac (14) 
A(b — c) = 


Taking (sum) norm bounds, we have the error bound 


lb’ — c’|, = |/Aljlel, = Sle}, (15) 
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If we know that the sum of the errors is no more than 2, then our error 
after 1 year is at most Sle], = 5-2 = 10. 
Following the model for n years, we let ec” denote the numbers 
of computers and dogs after n years. Then 
ec = A"e 
and after n years, the error is 
b” — ¢ = A"b — Ate 
= A”"(b — c) (16) 
= A”e 


Taking norms, we have the error bound 
Ib” — ec), = ||A"|,el, = llAlilel, = S”lel, (17) 


For example, suppose that c = [20, 10] but b = [22, 11], so 
e = [2, 1] with |e], = 3. Then computing c’ and b’, we find that 


ome EP alle] - [2] 
vem EE ltt) -[2] 


and b’ — c’ = [7, 6], with |b’ — e’|, = 13. The sum norm bound 
on this error is, from (15), 


and 


Ib’ — c'l, = Al. lel, = 5-3 = 15 


This bound of 15 compares well with the observed error of 13. If we 
had iterated n times, the error between b” = A”b and ec’? = A”c 
would be bounded, using (17), by ||All"lef, = 5” - 3. 

Repeating the analysis above using the max norm yields 


lb’ — ec’ lax = llAllmsl€lmx = 4°2 = 8 


This max bound of 8 compares well with the observed max error 
of 7. in 


The norm ||A|| provides a single-number bound for the magnifying 
effect of multiplying a vector x by a matrix A. When A is a square matrix, 
sometimes multiplying by A has exactly the same effect as multiplying by 
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a single number ); that is, for some vector u, Au equals Au. When Au = 
Au (and u + 0), the vector u is called an eigenvector of A (eigen is the 
German word for ‘‘proper’’) and the scalar A is called an eigenvalue of A. 


eineks | Pubiivaied in a Growth Model 


Consider again the growth model (13) for computers (C) and dogs (D) 
from year to year. 


C= 36. +. D (18) 
D’' = 2C + 2D 
If initially we had C = 1, D = 1, then we compute C’ = 4, D' = 
4. Letting C = 4, D = 4, we obtain C’ = 16, D’ = 16. Whenever 
[(C, D] = fa, a]. then [C’, D’] = [4a, 4a]. So 4 is an eigenvalue of 


(18) and any vector of the form [a, a] is an eigenvector. 
Observe that 


A7[a, a] = A(Afa, a]) = A(|4a, 4a]) = BS 16a] 
and in general, 
A*‘fa, a] = 4*[a, a] 
Note that if initially we had the (nonsense) vector [C, D] = 


[1, —2], then [C’, D’] = [1, —2]. So 1 is also an eigenvalue of (18) 
with eigenvector [1, —2] (or any multiple of [1, —2]). 1B 


peers 7 Stable Probability Vector for 
Weather Markov Chain 


In Example | of Section |.3 we introduced the following Markov chain 
for sunny and cloudy weather: 


Today 
Sunny Cloudy 
3 1 
Tomorrow aia é ° 
Cloudy | 4 4 


We claim that p* = [%, 3] is a stable probability distribution for this 
transition matrix A. That is, 


1 2 3 
1 1 a 
3113 ; x 


+ 


x 3 
+ 2X 3 


x 


Wi Colbo 


Wits Wits 
Twi bol 


‘peace 
| 
_———— 


So Ap* = I|p*, and p* is an eigenvector of A with eigenvalue |. @ 
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In the frog Markov chain (Example 2 of Section 1.3) we found by 
experimental computation that the probability vector p* = (.1, .2, .2, .2, 
.2, .1) had the property that Ap* = p* for the frog transition matrix A. 
Again this p* was an eigenvector with eigenvalue 1. 

This property of matrix multiplication acting like scalar multiplication 
for certain vectors happens for all matrices. It is the key to understanding 
the behavior of many linear models. An n-by-n matrix usually has n eigen- 
values, each with an infinite collection of eigenvectors. 

Observe that if u is an eigenvector of A with Au = Au, then any 
multiple ru of u is also an eigenvector, since A(ru) = r(Au) = r(Au) = 
A(ru). 

The following example shows how eigenvectors provide a simplifying 
way to understand matrix-vector computations. 


Example 8. Eigenvectors as a Coordinate System 
The computer—dog growth model from Example 6 has the form 


3 a 
y" =-AX, where A = ‘ ‘ IC, I 


gee? 


Earlier we saw that the two eigenvalues and associated eigenvectors 
of A are \, = 4 withu = [1, 1] and A, = 1 with v = [1, —2]. 

Suppose that we want to determine the effects of this growth 
model over 20 periods with the starting vector x = [1l, 7]. Let us 
express X aS a linear combination of u and v. By a method to be 
explained shortly, we find that 


x = 3u — 2v (i.e., [1, 7] = 3[1, 1] — 2f1, —2}) 
With matrix algebra, we can write 


3Au — 2Av 


Ax = A(3u — 2v) = 
= 3(4u) — 2(1v) (since u, v are eigenvectors) 
= 12u — 2v (19) 


For 20 periods, we have 


3A7°u — 2A*°y 

3(4°°u) — 2(17°v) (20) 
3-41, 1] — 2f1, —2] 

= [3 - 42, 3 - 42°) — [2, —4] 


A*°x = A?2(3u — 2y) 


Note how the eigenvector with the larger eigenvalue swamps the other 
eigenvector. The relative effect of the other eigenvector is so small 
that it can be neglected. So after n periods we have 


c. 2.5 Scalar Measures of a Matrix: Norms and Eigenvalues 135 
A"x = A"(3u) = 3-4"u = [3-4", 3-4”] (21) 


This is a lot easier than multiplying A”x out directly for various n. & 


Let us generalize the result in Example 8. Suppose that u and v are 
eigenvectors of a 2-by-2 matrix A with eigenvalues \, and \,, respectively, 
with A, > A, > O. Suppose we can express the vector x as a linear com- 
bination of u and v (e.g., x = au + bv). Then using the laws of matrix 
algebra, the matrix-vector products Ax and A’x can be calculated as 


Ax = A(au + by) = aAu + bAv = ad,u + bd,v (22) 
A*x = A*(au + by) = aA*u + bA?v = adtu + bd3v 


and more generally, 
A"x = ad"u + bdVv (23) 


As noted in Example 8, for large n, Xj will be much larger than 3, since 
A, > A>, SO we have 


A’x = ahju (24) 


This is clearly a very simple way to follow growth models over many 
periods. 


Theorem 3. Let \* be the largest eigenvalue of A (strictly larger in absolute 
value than other A’s) and u* a corresponding eigenvector. Then for 
any vector x, the expression A”x approaches a multiple of u* as n 
becomes large. 


There is an implicit message about A in writing a vector x as a linear 
combination of eigenvectors and in saying that A” approaches a multiple of 
u*. The latter statement says that somehow, A” must be closely related to 
u*. In Section 5.5 we show when A is symmetric that A can be decomposed 
into a set of ““simple’’ matrices generated by the different eigenvectors, and 
that A” approaches the “‘simple’’ matrix generated by u*. In Sections 3.1, 
3.4, and 5.5 we learn different ways to find eigenvalues and eigenvectors 
of a square matrix. 

The one missing step for us at this point is how to determine a and b 
so that x = au + by. Recall from Chapter | that anytime two variables 
must be determined, the calculations- are bound to involve two linear equa- 
tions in these two unknowns. 

If x = [x,, x], u = [u,, u,], and v = [v,, v>], the statement x = 
au + by is actually a system of equations 


El hag = 
X> u> V> 
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Or 


X, = ua t+ v,b 


Xy» = una + vob 


In the computer—-dog growth model with x = [1, 7], u = [1, 1], 
v = [1, —2], this system of equations becomes 


1 = la + Ib 
T= ja = 2 
which can be solved by elimination, yielding a = 3, b = —2. 


In closing, we observe that the norm of a matrix (any norm) provides 
an upper bound on the size of eigenvalues. The norm is a bound on the 
magnifying effect of a matrix A: |Ax| =< ||Al| - |x|. If Au = Au, so |Au| = 
|A|u|, it follows that |A| =< ||Al|. Check that this was true in Example 6. 
Typically, the largest eigenvalue (in absolute value) |A| is very close to ||A|| 
(in any norm). One can show (see Exercise 30) that the sum and max norms 


b 

of a 2-by-2 symmetric matrix A of the form f equal the largest 
a 

(absolute) eigenvalue. 


Tae MEG DT NAAM Tot in 
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Example 


9. Norm and Eigenvalues of a 
Symmetric 2-by-2 Matrix 


We claim that x, = [1, 1] and x, = [—1, 1] are eigenvectors of the 
matrix 


Computation shows that Ax, =.[5, 5] = 5x, and Ax, = [—1, 1] = 
x,. Thus 5 and | are the eigenvalues associated with x, and x,. 

As asserted above, the larger eigenvalue of such a symmetric 
2-by-2 matrix equals its norm (for all three matrix norms). The larger 


eigenvalue is 5, so ||Al| = 5. Checking, we see that 5 is the sum of 
each column and row, so by Theorem 1, 5 is the sum and max norm 
of A. a 


Section 2.5 Exercises 


Summary of Exercises 
Exercises 1—25 involve the norms of vectors and matrices, with Exercises 
13—25 being of a more ‘‘theoretical’’ nature. Exercises 26-31 discuss the 


determination of eigenvalues and eigenvectors and their use in computing 
A*x. 


1. Show that the euclidean norm of a equals Va‘ a. 
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2. 


Give the euclidean norm, sum norm, and max norm of the following 
vectors. 

(a) [1, 1, 1] (b) (3, 0, 0] (c) [—1, 1, 4] (a) (-—1433] 
(e) [4, 4, 4, 4] 


. The distance between two vectors a, b is defined to be the norm of 


their distance |a — bl. 

(a) What is the distance between the following vectors [2, 5, 7] and 
[3, —1, 4] using the euclidean norm, sum norm, and max norm? 

(b) Explain in words what the distance between two vectors in the sum 
norm measures. 

(c) Repeat part (b) for the max norm. 


. Give the sum and max norms of the following matrices. 


1 4 o. 3 ss 
(a) $ : | ()|6 1 3 
Mee ke ig 


. (a) For each of the matrices in Exercise 4, give the vector x* such that 


|Ax*|, = |[All, > Ix*l,. 
(b) For each of the matrices in Exercise 4, give the vector x* such that 
JAX* lax = [lAllmxI®*lenx: 


. (a) What is the sum norm of the following matrix? 


2 4 —-5 
A=]|-3 3 3 
4 }) “2 


(b) If v is a vector with sum norm = 3, give an upper bound on the 
sum norm of Av. 

(c) Give a vector with sum norm = 3 for which the bound in part (b) 
is achieved. 

(d) If w is a vector with sum norm = 5, give an upper bound on the 
sum norm of A’w. 


. (a) What is the max norm of the following matrix’? 


Lo 3 
A=]2 1 
Mos 


2 
3 
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(b) If v is a vector with max norm = 4, give an upper bound on the 
max norm of Av. 

(c) Give a vector with max norm = 4 for which this bound in part (b) 
is achieved. 

(d) If w is a vector with max norm = 6, give an upper bound on the 
max norm of A?w. 


. In the rabbit-fox model in Example 2, give a bound on the size of 


p® = A°p in the sum and max norms when p = [100, 100). 


. (a) In the following rabbit-fox models, 


(i) R' = R + .1R — .15F (ii) R' = R + .2R — .SF 
P= Fo AR = Vor F' = F + AR — .2F 


Determine the sum and max norms of the coefficient matrix A. 

(b) If the current vector of population sizes is p = [20, 20], determine 
bounds (in sum and max norms) for the size of p’ = Ap. Compute 
p’ and see how close it is to the norm bounds. 


(c) Compute a sum norm bound on the size of the population vector 
after three periods, p~’ = A’°p. 


(a) In the following model for the growth of rabbits, foxes, and hu- 
mans, 


R=SR + RK Ae — 2 
=F + AR —..2F — 3H 
YW = ff + IR + 1F + 18 


determine the sum and max norms of the coefficient matrix A. 
(b) If the current vector of population sizes is p = [10, 10, 10], de- 
termine bounds (in sum and max norms) for the size of p’ = Ap. 
Compute p’ and see how close it is to the norm bounds. 
(c) Give a sum norm bound on the size of population vector after four 
periods, p™. 


cs 
| 


In Example 5, suppose that we assume c = [15, 5] when the correct 
value is actually [14, 7]. What is the maximum size that the error could 
be after 3 years (using the sum norm)? 


In the rabbit-fox model in Example 2, suppose the initial vector of 
p = [100, 100] actually should have been [95, 103]. How large an 
error is possible in p’, in p® (using the sum norm)? 


Whereas Example 5 discussed the absolute size of errors, it is often 

more interesting to consider the relative size of errors. The relative error 

in b is |b — e|/|e| if b is used when ¢ really should be used. (Use the 

sum norm.) 

(a) IfR’ = R + F, F’ = 3R — 4F and p = [R, F] was set equal to 
(3, 1] when it really should have been [2, 2], what is the relative 
error in p and what is the relative error in p’? 
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14. 


15. 


16. 


17. 


18. 
19, 
20. 
21. 


22. 


24. 


(b) If R’ = 2R — 3F; F' = —SR + 7F and p = [R, F] was set 
equal to [S, 5] when it really should have been [6, 4], what is the 
relative error in p and what is the relative error in p’? 


(a) Describe those vectors for which the euclidean norm and max norm 
are equal. Explain the reason for your answer. 

(b) Describe those vectors for which the sum norm and max norm are 
equal. Explain the reason for your answer. 

(c) Describe those vectors for which the euclidean norm, sum norm, 
and max norm are all equal. Why must these be the only vectors 
with this property? 


We prove that |al_,. = |al. = |al,. 
(a) Show that the max norm of a vector a is always less than or equal 
to the euclidean norm of a. 


(b) Show that the euclidean norm of a vector a is always less than or 
equal to the sum norm of a. 


Let a be an n-vector. 
(a) Show that al, = nial... (b). Show that lal, = Vnlal.. 


Show that |a - b| < Jal, - |bl,... 


(a) Show that |a + bl, < |al, + |bl,. 
(b) Repeat part (a) for the max norm. 


Explain why the sum norm and max norm of a symmetric matrix are 
the same (symmetric means a; = a;;). 


Let A’ be the transpose of A (A? is obtained from A by interchanging 
rows and columns). Show that ||A7||__. = ||All, and ||A7||, = ||All,,,.- 


(a) Show that ||A + Bll, = ||Al], + ||BIl.. 
(b) Repeat part (a) for the max norm. 


(a) Use the fact |Ax|, = ||Al),|x|, to show that ||AB||, = ||Al|,||B).. 
(b) Use part (a) and Exercise 20 to show that ||AB||_. = ||All,,,,/[Bll,..- 


Necessary fact: (AB)' = B’A’. 


. If A is a matrix that is all 0’s except one entry that has value a, show 


that ||All, = [[Allnx = lal. 


(a) Give the adjacency matrix A(G) for this graph G. 
b 


hh te 
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(b) 
(c) 


(a) 


(b 


— 


(c) 
(d) 


(a) 


(b) 


(c) 


(d) 
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What is the sum norm of A(G)? 
Explain in words an interpretation that can be given to the sum 
norm of the adjacency matrix of a graph. 


Show that for the matrix A = E 4 , a 2-vector v = |[a, b] with 
0=azZ1,0 2s b = 1 will maximize the max norm of Av by 
setting a = b = |. 

Generalize your argument to show that for any 2-by-2 matrix A 
with nonnegative entries and any 2-vector v with max norm 
lvl mx = 1, the value of |Av|,,. is maximized when v = [1, 1] and 
|Av|... = Maximum row sum (in absolute value) of A. 

Explain how to modify v if A has negative entries. 

Hint: Possibly change one (or both) of the 1’s in v to — I’s. 
Generalize parts (b) and (c) to n-by-n matrices. 


cr 
The matrix k é has eigenvectors u, = [1, 1] and u, = 


(1, O]. What are the corresponding eigenvalues for these eigen- 
vectors? 


2 l 


[1, —1]. What are the corresponding eigenvalues for these eigen- 
vectors? 


The matrix has eigenvectors u, = [1], 1] and u, = 


| 6 
The matrix i = has eigenvectors u, = [—2, 1] and 


u, = [-—3, 2]. What are the corresponding eigenvalues for these 
eigenvectors? 


The matrix 
a4 4% 
a. VERS 
—<“S 2 2h 
has eigenvectors u, = [2, 0, 1], uy = [6, 4, 5], and u, = 


[4, 3, 2]. What are the corresponding eigenvalues? 


Verify for each Markov transition matrix A that the given vector is a 
stable probability vector. 


(a) 


(b) 


_|'4 tt 
& = , 1’ p = [2, 2] 


| 


wal tale WI 
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(c) A = 


wir wi 
vK- © we 
wh win © 
io 
I 
ry 
mele 
=) 
leo 
—) 


a 
28. The matrix 6 | has eigenvalue 4, = 7 with eigenvector u,; = 


[1, 1] and A, = —4 with u, = [—S, 6]. 

(a) We want to compute A*v, where v = [—2, 9]. Writing v as v = 
3u, + u,, compute A*y indirectly as in Example 8. 

(b) Give an approximate formula for A”v. 

(c) Use the method discussed following Example 8 to determine a and 
b so that the vector v = [2, 13] can be written as v = au, + bu,, 
and use this representation of v to compute A°v. 


Red 
29. The matrix A = i 1 has eigenvectors u,; = [1, 1] and u, = 


30. 


31. 


(1, O]. 

(a) We want to compute A‘*v, where v = [3, 1]. Writing v as v = 
u, + 2u,, compute A*v indirectly as in Example 8. 

(b) Give an approximate formula for A”v. 

(c) Use the method discussed following Example 8 to determine a and 
b so that the vector v = [6, 9] can be written as v = au, + bu,, 
and use this representation of v to compute Av. 


 b 
Show that if A is a 2-by-2 symmetric matrix of the form |‘ | then 
a 


the eigenvalues of A are a + b anda — Db. Verify that [1, 1] and 
[1, —1] are the associated eigenvectors. 


Note: a + b = ||All, = |lAllnx- 


Show that if \ is an eigenvalue of a matrix A, then \? is an eigenvalue 
of A’. 


Efficient Matrix Computation 


Computational Complexity and 
Error Analysis 


In this section we discuss computational details of matrix multiplication. A 
lot of arithmetic must be done when two matrices are multiplied, and despite 
the great speed of modern computers, theoretical shortcuts are still needed 
when large matrices must be multiplied repeatedly. It is also important to 
know the relative complexity of different basic matrix operations. For ex- 
ample, which is faster, squaring an n-by-n matrix A or solving the system 
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Ax = b of n equations in mn unknowns? (Would you have guessed that 
normally solving Ax = b is faster?) 

Before looking for shortcuts, let us first determine the computational 
complexity of matrix multiplication, that is, count how many entry-by-entry 
multiplications and additions are required to multiply an m-by-r matrix A- 
times an r-by-n matrix B. Each entry in the product AB is obtained by 
forming a scalar product af - bf of a row af of A with a column bf of B. 
Since a and bf are both r-vectors, their product requires r multiplications 
and r — | additions. There are mn entries to be computed in AB, each 
requiring r multiplications and r — | additions. Thus in total we have 


Theorem 1. The matrix product AB of the m-by-r matrix A and the 
r-by-n matrix B requires mnr multiplications and mn(r — |) additions. 


Corollary A. (i) The matrix product of two n-by-n matrices requires n° 
multiplications and approximately n° additions. 
(ii) The matrix-vector product Ax of an m-by-n matrix A times 
an n-vector Xx requires nm multiplications and m(n — 1) additions. 


Proof of (ii). Treat x as an n-by-1 matrix. x 


Corollary B. If the sizes (numbers of rows and columns) of two matrices 
are doubled, the number of multiplications in the matrix product in- 
creases by a factor of 8. ' 


Proof. If A’ is 2m-by-2r and B’ is 2r-by-2n, then by Theorem 1, the 
number of multiplications is (2m)(2n)(2r) = 8mnr. m 


The reader should verify Corollary B by squaring a 4-by-4 matrix and 
then squaring an 8-by-8 matrix on a computer. The second calculation should 
take eight times as long as the first. 

A matrix is called sparse if most of its entries are 0. Although the 
percentage of O-entries to qualify as sparse is not defined precisely, most 
people use the figure of 80%. Large matrices in practical problems often 
have over 99% Q-entries. The point is that if a matrix is sparse, substantial 
Savings in computation should be possible by forming only nonzero products. 

We shall consider two approaches for reducing the computation time 
in sparse matrix multiplication. The first approach is symbolic, using matrix 
algebra, and it also works on nonsparse matrices with special patterns. The 
second approach involves data structures to represent sparse matrices effi- 
ciently. 

First let us say a few words about the numerical stability of matrix 
multiplication. We want to know how much small errors, both errors in 
estimating the values of matrix entries and roundoff errors in computation, 
can influence the result of a single matrix multiplication or a sequence of 
multiplications, as in computing powers of a matrix. 

There is nothing inherently bad about a single matrix multiplication 
except for the magnification of errors inherent in subtraction (if some terms 
in a scalar product are positive and some negative). Subtraction can be very 
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lethal if some small numbers in the data have only one or two significant 
digits. For example, consider the scalar product 


(245, —149] - [.2, .3] = 245 x .2 + (-149) X 3 
= 49.0 — 44.7 = 4.3 


The result of 4.3 is meaningless for the following reason. Since both the 
terms .2 and .3 have only one-digit accuracy, 245 x .2 = 49.0 and 
149 x .3 = 44.7 are only accurate to one significant digit. That is, the 9.0 
in 49.0 and the 4.7 in 44.7 are essentially random numbers. Hence the 
subtraction result, 49.0 — 44.7 = 4.3, has no meaning. An indication that 
subtraction-induced error could have occurred 1s if the magnitude of an entry 
in the matrix product is less than the magnitude of entries in the input 
matrices. 

A long series of matrix multiplications may result in small errors build- 
ing up into large errors, just as large errors can occur in repeated scalar 
multiplication. For example, if we need to multiply 1.15 times itself 10 
times, the correct answer is 1.15'° = 4.045558. But if we round 1.15 to 
two significant digits as 1.1 or 1.2, we get answers of 1.1'° = 2.6 and 
1.2'° = 6.2. 


Partitioning of Matrices 
Any vector a can be partitioned into two or more subvectors, such as 

a = [a), a, as] (1) 
For example, if a = [1, 2, 3, 0, 0, 0, 1, 2, 3] and if a’ = [1, 2, 3] and 


0, is the three-entry zero vector, we can write a as [a’, 0,, a’]. 
A matrix A can be partitioned into submatrices, such as 


A=TA, AJ (2) 
or 
ae a ee _5 
As, A,» 
or 
A, 
A* | 
A=|— — A 
: (4) 
AY Ae he. 


For example, we might partition 
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raed AG 4 

| A, 
23 £555 1 ie re bees 
a 4S 64 OOP _ 
a 2S. oo! Bi Oe 
rie OO) 4% 4 

A A 

ta een! thot 1 | Ao | A 


where A, is a 2-by-2 matrix of 1’s and Ag is a 2-by-2 matrix of 0’s. 


The partition of a matrix A will typically correspond to different com- 
ponents of the underlying model. A partition of A in form of (3) would arise 
in a Markov chain transition matrix if the states divided in some natural way 
into two groups, group S, and group S,. The partition in (2) arises naturally 
if the columns of A represent two different types of variables. For example, 
the Leontief supply—demand equations (see Example 3 of Section 2.2) were 
written in matrix notation as 


x SO sb ch) ch UX 100 
i Ce, ee 50 
x = Dx ic or dl “a + (5) 
X Re Re ae ee | 100 
X4 Ook sh Oils; 0 


The right-hand-side numbers in these equations could be combined into one 
expanded matrix by appending c as the last column: 


y= < (6) 


Correspondingly, we expand x by adding an additional entry with value I, 
x’ = [x, 1]. Now (5) becomes 


x, 42 2 2 100 2. 
x &. 3°32. WV -s-* 
oe D’ ' 2 oat 7 
ae we Bie 11 0.2 103) 
xX 
X4 0.a o@ @ ‘ 


Partitioning the Adjacency Matrix 
of a Graph 


Sy IA a Se eg 
ei Bits Bea NE ee nee i 
Example 1. 


Figure 2.4 
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The graph G in Figure 2.4 has the following adjacency matrix: 


PD 2S Foe A 
2 tape we T aes 
PE oS ee tee. 
crt 1 St Owe.) @ 
roy AOA be 0.0.0 4 (8) 
eS AS CO MEE 8 
Pimiee 2) RS A) 
See t. KOL B T 8 
Pie Ol. . 8s te) a 
A(G) has the nice partitioned form 
Oo J £9 
AY od ee 
A(G) = here A’ = 9 
oo ¥ i wie popu: dy te 
ok aD 
and I is the 4-by-4 identity matrix. ® 


Partitioning is very useful in matrix multiplication because we can 
initially treat the submatrices like scalar entries. For example, if 


A), a i ] 
A = and B= (10) 
a Ae a 


AB = Peg 7 A,B, A,,B,, i | 
A>B, + A> B,, A,B), + A,B,, 


then 


(11) 


Verification of (11) is left as an exercise. Of course, (11) requires that the 
number of columns in the A submatrices equal the number of rows in the 
appropriate B submatrices. The situation with partitioning is similar to all 
the matrix algebra rules represented in Section 2.4. Unless it is expressly 
prohibited, anything you would like to be true about partitioning is probably 
true. 

If some of the submatrices of A and B have nice forms (e.g., O or TI), 


the amount of work needed to compute the matrix product AB is greatly 
reduced. 


TRAPS 
Example I (continued). Powers of a Partitioned 
Adjacency Matrix 


Let us use (11) to compute the square of the partitioned adjacency 
matrix A(G) in (9). 
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cps is + Il A’l + a 


A'l + 3A" Afar os 
— Aw ee ae 
| 24! ~~ £844 


Computing A’? can be done by inspection faster than entering the 
numbers in a computer program. It just involves counting how many 
l-entries each pair of rows in A’ have in common (this was explained 
in Example 1 of Section 2.3). So 


(12) 


eet ar 3 oF 30 pes 
fis Rf 1421 

A”? = d A’+T= 13 
eg Se = 4 gal 
ae a i 7 as Ges ee 


Inserting A’? + I and 2A’ into the partition product in (12), we obtain 


GG Be Be oe 

af[3 1 b 2)0 2 2-0 

Bee > Diese 2.2 

ce See ; 
ee 0 eee ae Weg lr (14) 
MOS ¢ [6 aoeee PTS 

fips -@ DoZpUre 2.4 

12 2 6 Pome 4s 

ro? 2 012-1 123 


Using partitioning, the only matrix product we had to calculate was 
A’*, a 4-by-4 problem. By Corollary B, this is an eightfold savings 
over computing A° directly. 

Suppose that we want to compute higher powers of A(G). The 
result will not be as nice, since the four 4-by-4 submatrices of A*(G) 
are not as simple as those of A(G). Still the problem is easier than 
multiplying without partitioning. The computation of A*(G) using (11) 
is left as an exercise. ie 


The pattern of nonzero entries in A(G) is typical of patterns found in 
a Markov chain transition matrix, a Leontief supply-demand matrix, or a 
constraint matrix in a linear program, where there are interrelated clusters 
of states or industries, such as node sets {a, b, c, d} and {e, f, g, A} in G, 
with a small number of links between states in different clusters. See the 
Exercises for specific examples. 
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Data Structures for Sparse 
Matrices (Optional) 


Various schemes can be used to store a sparse matrix (with most entries 
equal to 0). The objectives are 


1. To minimize the space needed to store the matrix by storing only the 
nonzero entries. 

2. To speed matrix multiplication (and other matrix calculations) by en- 
abling us to compute only those terms in the product that will be nonzero. 


There are two general categories of sparse matrices. The first type of 
Sparse matrix is a matrix in which the nonzero entries form a particular 
pattern. An example of such a pattern is the transition matrix A of the frog 
Markov chain. 


15 
O82? ar ego. 0 (9) 


QO 2-50) 230 
DE Ba eae 


All the nonzero entries in A are grouped around the main diagonal. A 
matrix with such a pattern is called a band matrix. The bandwidth of a 
band matrix is the smallest number w such that if a;, is a nonzero entry, then 
li — j| < w. The bandwidth of the matrix A in (15) is 2. 

In the ith row of a band matrix with bandwidth w, the nonzero entries 
occur in positions i — w + | through i + w — |. For obvious reasons, 
band matrices such as A with w = 2 are called tridiagonal matrices. Band 
matrices arise in many different settings. 

The matrix in Example 11 of Section 2.2 for filtering digital pictures 
was the following 12-by-12 tridiagonal matrix: 


: #- O40 12 0. O°0° 0 0° 0:0 
A224 900000000 0 
G 254-420 00 00: 0.0 ‘0 
0 OR Se Be Oe 0-0 
0 8 O72 he Oa 0°68 8-0 
p-}9 99085 80000 0 
7.0°O OFF) 9 9 4°0.0 8 0 
b Oe Om 8 4 O08 
YO 6 -O..0. 0-00 3.2 28: 0 
10 O O90. 0° 6. 468.2 @ 
0-60.70: 60-8 0-0 2 8.28 
GC 6°08 OO ono 8 O°O 4 
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In Section 4.7 a tridiagonal 100-by-100 matrix arises when we approximate 
a differential equation with a finite-difference system. 

| The natural way to store a band matrix A is to store just the subvectors 

a* of nonzero entries in each row, so that the ith row of A is of the form 

= [0, a*, 0]. For example, for the frog Markov chain af = af = [.25, 

5, .25]. We also must store the subvectors of nonzero entries in each column. 


When we are multiplying two band matrices A and B together (with 
bandwidths w and w’, respectively), there will be many cases where the 
bands of A and B do not overlap, so that the vector product will be 0. This 
will happen ifi + w — 12=j — w’ orifj + w’ — 1 2i — w. [Check 
this for A = B = FP, the matrix in (15).] When the bands do overlap, the _ 
lower bound for k in the summation will be max(l, i - w + 1,j — w’ 
+ 1); finding the upper bound is left as an exercise. 

Note that in calculating powers of the transition matrix A in (15), we 
could compute A? as the product of two tridiagonal matrices. However, the 
result A? is not tridiagonal, so to compute A? = AA? and higher powers 
we would be multiplying a tridiagonal matrix A times a regular matrix. 


The second type of sparse matrix data structure is for a matrix whose 
nonzero entries are randomly located. 


SR wa ted oy cai ae fan OS Hi Cues 
Example 2. Squaring a Sparse Matrix 


Consider the example of the 16-by-16 0-1 matrix M in (16) that has 
only 32 1’s among its 16 X 16 = 256 entries (about 12%). M in 
titions into four identical 8-by-8 matrices, which we call N. 


Pet a ee oe A ey D EF G 
1fo 00 00100/0000010 0 
210 0°10 GOLS4116 & £8 Bea 
31.0 0 0:0 Geo C16 1e & 10-0) B O& oO Bo 
410. 0 00-0 Oe 116 “Of oO. ‘a Gri 
Shr OO 0-7 2 ln 0 a a ore 
Sb 8 FO OO ahi @ ao Se 7.8 
ae O10 9.0 1 OPO AOR Eee 

Mt = Pam: ee ee 16) 
cn 0 ebsites a a va 
A100 1°0°0 0) alo GO 1 0.6 GO 4 
B10 0000000!0 000000 0 
clo0000001100000001 
Hi! OHO Fao] Oo 4 0 F © 0 8 
EP Pare Fie Go OT Ooh Ee 0 
RA O10 O48. 6S PO. dh 8 OS Ge 
Glo0000000/]0 0000000 
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To limit row and column names to one symbol, we use the notation 
A = 10,B = 11,C = 12,D = 13, E = 14, F = 15,G = 16 in 
(16). 

For each row and each column of N, we make lists R; and C; of 
the nonzero positions. For example, the first few row lists for N would 
be R, = (6), R, = (3, 8), R, = ( ). 

Suppose that we want to compute the square of M. In terms of 


N|NIPN|N 
M2=|— —|/— — 
N|NJLN]|N a7) 
N2 + N2 | N2 + N? N’ | N’ 
N? + N? | N? + N? N’ | N’ 


where N’ = 2N?. So to square M we only have to compute the square 
of N. 

Recall that when we multiply N times N, entry (i, j) in N? is 
simply the number of positions where the row nf and column nf of 
N are both | (see Example | of Section 2.3). To determine how many 
I’s nf and nf have in common, we simply look at our lists of 
]-entries for n* and n& and see how many positions these two lists 
have in common. Notice that no actual multiplication ever occurs in 
finding the matrix product of two sparse 0O—1 matrices. Using the lists 
of 1-entries to perform the matrix multiplication typically requires only 
one-tenth the time of normal matrix multiplication, when multiplying 
two 15-by-15 0-1 matrices with around 10% 1l-entries (savings are 
greater for larger or sparser matrices). 

Using lists of l-entries, we compute 


0 OO. 0 LO @ DO Ce Wb “OQ 
OP, SP ar OD) “ie Bight Ose Oey Oo 7 
SRS oO Bh 0 og eg OS 8 oe Og 
N? = 010 000 0 0 J oo. @T Oh 8 21 
ria 0 o@ 0 CG 8 0 roo & Oo 6 0 6 
mee 1. Oo 8D Pe. ? f° eC Dp D 
OO. yO TF & Ea ee Ge pr ok eB .G 
010 0000 0 0 50: Os Or TF OG ® 
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ee Be ir Der a ag (18) 
Ce Coe & wed oO 
Gre 0°86: @ &@ 6:0 
070 00 0 0 0 0 
OLS OO. O 4°@, 0 
on? & Oo. 8 bP 84 
pao 0 & @& 0 2 8 
iA Or OO 1) 0 
The entries in M? are obtained from N? as shown in (17). a 


If the nonzero entries in M were not always 1, then for each position 
in arow or column list we must record the number together with its position. 
For example, if the first three rows of M were 


ez, Sl 4 De ee A ee a SE 
LOD Oe Or. 2°38 8-0. Oe. BO Ba 
21o OC 2. Oe 7 ho 8.0 =—2 8 Ye Q 
3 Oe ee 0) ED OS eek |G 


then the first three row lists would each be a pair of lists: R,; = (6), Rj = 
(4); R, = (3, 8, D), R5 = (2, 7, —2); R; = (A, D), Rz = (1, 9), where 
R; is the list of nonzero positions in row i and R; is the list of values of 
these nonzero positions. 


When we multiply two sparse matrices M and N that are not 0-1 
matrices, we start with the procedure above of comparing lists for m* and 
nf to find out in which positions m/* and nf are both nonzero. But now we 
multiply the two numbers when two nonzero positions match and add up 
all such products to obtain entry (i, 7) in MN. 

The main work in multiplying two sparse matrices stored in this 
fashion is comparing the lists R;, and C,; for matching positions (there is 
lots of testing for matches and few cases of an actual match when we 
multiply). If list R; has s positions (of nonzero entries) and C; has ¢ positions, 
the two lists can be checked for all possible matches with s + t — 1 
comparisons (this basic fact about data structures is left as an exercise). If 
A is an m-by-r matrix with probability p, of an entry being nonzero, then 
on average, row a* has p,r nonzero entries. Similarly, if B is an r-by-n 
matrix with probability p, of an entry being nonzero, then, on average, 
column bf has p,r nonzero entries (r is the size of af and bf). So on 
average, the scalar product af » bf requires pyr + pjr — 1 comparisons. 

We have to perform mn such scalar products to obtain all entries in 
the product AB, so in total this is mn(pyr + par — 1) = mnr(p, + pz - 
|/r) comparisons. Recall that mnr operations (entry-by-entry multiplica- 
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tions) are required for normal matrix multiplication. For cxample, if p, and 


>= 


.l and r = 15, then (p, + p, — 1/r) = .13, so the computational 


effort will be around one-eighth as much using these data structures. 


The methods presented here make it possible to construct a mathe- 
matical model with a 100,000-by-20,000 matrix to, say, describe the distri- 
bution and use of all forms of energy by all sectors in the U.S. economy. 
Often, no more than | in 1000 of the entries in such a matrix is nonzero; 
further, the matrix can be partitioned extensively. The combination of par- 
titioning and data structures can reduce the time of sparse matrix operations 
by a factor of 1,000,000. 


Section 2.6 Exercises 


Summary of Exercises 

Exercises | and 2 involve the speed of matrix multiplication. Exercises 3—15 
deal with partitioned matrices. Exercises 16—21 involve band matrices, and 
Exercises 22 and 23 involve sparse matrices. 


1. How many multiplications are required to perform the following matrix 
operations? 


(a) 
(b) 
(c) 
(d) 
(e) 
(f) 


2. (a) 


(b) 


Square a 10-by-10 matrix. 

Square a 100-by-100 matrix. 

Multiply a 20-by-5 matrix times a 5-by-20 matrix. 
Multiply a 5-by-20 matrix times a 20-by-5 matrix. 
Cube a 10-by-10 matrix. 

Multiply a 10-by-10 matrix times itself 10 times. 


Suppose that you had to compute the product ABC, where A is 
8-by-4, B is 4-by-8, and C is 8-by-5. You can either multiply 
A times B and then multiply the product AB times C, or you can 
multiply B times C and then multiply A times the product BC. 
How many operations are involved each way in computing ABC? 
Which way is faster? 

Repeat part (a) for the product ABC, where A is 10-by-8, B is 
8-by-4, and C is 4-by-6. 


3. Partition the following matrices into appropriate submatrices. 


(a) 


ok 2 ato hy a eet. wee Toe Ty 
Sat we eee Ce re et. OYE 
i Bub? Peer zag Pe ae BR Pe De 
tS Ae ree b ety RR et he Wes B.D 
Eke ee ee ke ™ Coa ie ee ee I Gy 
oO ee oe hed Beer. Zl Wet of 
ie ae a ef ee ee OD DD 
Lath” 2 Ret Se a Bare oe ee ee 4 
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be Se ee ee oe We 2 eee, 
Dh ee OE a oe 22 Se 3 
(c) 2) ae oO (d) Paes f Bs 0 ag 
2 2 fr ee a ee Os | mh Oa 2 Te So A 
Cae ee Rey Re 28) Fo BO 
Pee 2 eb ee pre Fh Oh Pot Ds 


4. Write the following systems of equations in matrix notation as 
x = Dx*:; define D, x, and x*. 
(a) x, = 3x, + 4x, + 100 (b) x, = Xx + .3x, — .4%, + 100 
X5 = 2x, -_ 3x, + 200 X> —_ X>5 os 2X4 + 3X7 4 200 
(c) py = 2p, — pz + 100 
P> = Sp, + 3p2 + 50 
L = p, + Pr. 


un 


In equation (7), alter D’ by adding another row so that the equation has 
the form x’ = D"x’. 


6. Write the adjacency matrices of the following graphs and define a par- 
titioned form of the matrices. 


(a) G, bh e AR ag 


, 
> 
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10. 


11. 


In the maze, a person in any given:room has equal chances of leaving 
by any door out of the room (but never remains in a room). Write the 
Markov transition matrix for this maze. Write the matrix in partitioned 
form. 


. In a Leontief economic model, we might consider three commodities 


A, B, C in two different countries. Suppose that there is the following 
interindustry demand matrix D for the three commodities in each coun- 
try. 


In addition, to produce a dollar’s worth of each commodity in the second 
country requires as input .1 dollar of commodity A from country |. 
Similarly, each first country commodity requires .1 dollar of the second 
country’s commodity A. The consumer demand in the first country for 
the three commodities is [50, 100, 50], and the consumer demand in 
the second country is |100, 200, 100]. 

Write out the system of equations for this Leontief model. Also 
write the right side of these equations as D’x’; define D’. 


Determine the square of the matrix in Exercise 3, part (b). 


Determine AA’ for the matrix in Exercise 3, part (a) (A’ denotes the 
transpose of A— with rows and columns interchanged). 


(a) Partition the matrix 


eo © 6:60.60 QQ: = &@ = 
a Ss. | © oo oa = & © 
Se o eo own oO OC O 
So 68 @onNn Cn Cc CO © 
SCo ON NN OC CO O 
Oo ws oQ eo oo 2 2c @& 
wowdeodad 2 2 @ @ 
Wwoqgedjcreoeo fd © 


in terms of the matrix 


and the zero matrix 0. 
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12. 


13. 


14. 


15. 
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(b) Write A? and A? in partitioned form in terms of B and 0. 

(c) Write out A* entry by entry. How many multiplication operations 
are required to write out A* using the advantages of the partitioned 
form? How many multiplication operations if A* was done nor- 
mally? 

(d) Compute A*v where v = [1, —1, 1, 0, 1,0, 1, —1, 1). 


Compute the square of each adjacency matrix in Exercise 6 using the 
partitioned form of the matrix. 


Suppose that the adjacency matrix A(G) of a graph G has the partitioned 


form 
_| O 
rc es cf 


where J, is a 3-by-3 matrix with each entry |. 
(a) Draw G. 


(b) Write out all the entries in A7(G). 


Determine the partitioned form of A*(G) in Example | [in terms of A’ 
and I, just as A*(G) is expressed in (12)]. 


(a) Let 


» 2. g Bez 
A =i-f 3° } and B=13 4 
OR HE a, 


Partition A into three 3-by-1 submatrices and B into three 1-by-2 
submatrices, and use this partition to compute the product AB. 
Compute the matrix product AB the normal way and compare the 
arithmetic in the two methods. 

(b) Extend part (a) to show that in any matrix product AB, the 
m-by-r matrix A can be partitioned into r submatrices each con- 
sisting of one of A’s columns and the r-by-n matrix B partitioned 
into r submatrices each consisting of one of B’s rows. Explain in 
words the effect of this partitioned product [generalize the com- 
parison you made in part (a)]. 


16. Compute the square of the frogger transition matrix in (15). 
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17. 


18. 


19. 


20. 


21. 


Compute the square of the filtering matrix F. 
Compute the square of the following band matrices. 
"4. Be BY Og ‘Oo f°? 2.208 
Oem 27 WG D2 OO Tees 
(a) PP Oe eat) SD YY Be 2 Oa see 
YO -O 2 b 4 aig bo 2 8 8 8 
a> Oa 27 O Dest “O27 see 
ye 8 o Oo - 2 ho oD TO. 28 
POD 5G” FH «Be ad 
Suppose that in the frog Markov chain there were 12 lanes in the su- 


perhighway, not 4. In addition, there are still the left and right sides of 

the road. 

(a) Describe the new Markov transition matrix by telling the bandwidth 
and the values of the entries on, and just off, the main diagonal. 
The entries in the first and fourteenth columns should also be de- 
scribed. 

(b) Describe the square of this Markov transition matrix in the same 
terms as given in part (a). 


Suppose that we define a sequence of values a,, ad, a3,..., ag such 
that the weighted running average (a;_, + 2a; + a,,,)/4 = 5i, for 
i= 2,3,...,7and fori = 1, we use (a, + a,)/2 = 5 and similarly 


for i = 8, (a, + ag)/2 = 35. 
(a) Write out this system of equations for the unknown a,. 
(b) Given that a, should be 35, solve the system in part (a). 


What is the bandwidth of the square of a k-bandwidth matrix? 


22. (a) Compute the squares of the following sparse matrices. 


C O-0 O10 , 4.06 6.0 0 
000100 001000 
« leh pr or COO 10.0 6? 0.0 
Oli o00001 O10 0001 0 
100000 000001 
00.16 0 0 1000 6 6°60 


(b) Compute the cubes of the matrices in part (a). 


23. (a) Give a partitioned form of this 12-by-12 sparse matrix. 
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2 Oo 230) 8 6-0 & 2.6 oe -3 
c2.ah oe 0 8 OD 1 0 
Ge Ss the 2 Re OR - Oa e 1. 1G 
70-0. 2 O° 2° 0 Po O-00-8 
240 O90 2.0 D> ft & GOO 
w&2 0 0 fF 2-2-8 OB Ooh a 
Peo O08 2 OD O02 a 
oO OO dO Os. 2 B-H 
Pie fe Fob ee 2h 8 
Ve foe tf Oo. 0 8 2 be3 
ure 8&0 8.2 0 6 6-2 8 
io 6 0-0 .0°0. 2 8 0-24 


(b) Compute the square of this matrix using the partitioned form and 
sparse matrix multiplication. 

(c) Give a formula for the nth power of the upper right 6-by-6 sub- 
matrix. 


Solving Systems of 
Linear Equations 


Section 3.1. Solving Systems of Equations 


with Determinants 


In this chapter we discuss the central mathematical problem of linear models, 
solving a system of linear equations. The models introduced in previous 
chapters will be used to motivate and illustrate the computational techniques 
and mathematical concepts. Much of the theory about solutions to systems 
of linear equations will be delayed until Chapter 5. First, in Chapter 4, we 
use the methods from this chapter to solve systems of linear equations arising - 
in various applications. 

In this first section we consider an algebraic approach involving de- 
terminants for solving a system of linear equations. Determinants produce a 
useful formula for solving two equations in two unknowns. Although the 
method does not yield efficient computational schemes for larger systems, 
it does yield important information about when such systems have solutions 
and about eigenvalues of the coefficient matrix. A more general method for 
solving linear equations is discussed in Section 3.2. 

Recall that the quadratic equation ax* + bx + c = 0 has the solutions 
x = (1/2a)(—b + Vb? — 4ac). We seek similar formulas for solving a 
system of linear equations. Consider the following system of two equations 
in two unknowns: 


ax + by = 
cx + dy 


(1) 


| 
~~ % 
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Let us solve (1) for x and y. Multiplying the first equation by d and the 
second by b and then subtracting, we obtain 


adx + bdy = de 
—(bex + bdy = bf) 
(ad — bc)x = de — bf 


Solving for x, we have 


. Ge = BF 
Sad ibe (2) 


Substituting (2) in the first equation of (1) and simplifying, we obtain 


2 f= ee 


, ah = We (3) 


Formulas (2) and (3) give us immediate solutions to any system of two 
equations in two unknowns. For example, the system 


2x — 3y = 4 (4) 
x+2y=9 


is solved using (2) and (3) (witha = 2,b = —3,c = 1,d = 2,e.= 4, 
f = 9). 


_ de - bf _2x4~-(-3x9_35_, 
* =e  IXhetaoa t 


= ———- —  —_ = — = 2 
YS al = be 242 = (=3x70 9 


By using various techniques, it is possible to extend these formulas to 
obtain expressions for the solutions to three equations in three unknowns 
and more generally to n equations in n knowns. However, these expressions 
become huge, and evaluating them takes far longer than Gaussian elimination 
(which is presented in Section 3.2). 

Formulas (2) and (3) have important theoretical uses. The critical part 
of the formulas is their denominators, which are the same: ad — bc. This 
denominator is called the determinant of the system. It turns out that the 
expressions for the solution of three equations in three unknowns also have 
common denominators. This result holds in general for the solution of any 
system of n equations in n unknowns. The determinant of the n-by-n matrix 
A is defined to be the denominator in the algebraic expressions for the 
solution of the system Ax = b. Formulas such.as (2) ‘and (3) give a unique 
solution to the system of equations, provided that the determinant is nonzero. 
The determinant is like the expression b* — 4ac under the square-root 
sign in the quadratic formula; recall that b> — 4ac is called the discriminant. 


AyD” 
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The discriminant must be nonnegative for the quadratic equation to have 
real solutions. Here the determinant must be nonzero for a unique solution. 
Rewriting system (1) with matrix subscripts, we have 


A,X, + Ay2X, = D, 
b, 


4;X, + Az7X> 


or in matrix notation, 


where 


Eogh Bh Ake 
G5, 442 b, X4 
Now (2) and (3) become 


Q55b, — Ayob a,;b2 — az,b 
he 229} 1292 eae 1192 2191 (5) 


? 
G1;422 — Gj24>{ 1);422 ~ Gyr} 


The determinant det(A) of the 2-by-2 matrix A is written as 


In the 2-by-2 case, det(A) is simply the product of the two main 
diagonal entries minus the ‘product of the two off-diagonal entries. Since 
every square matrix can be interpreted as the coefficient matrix of a system 
of linear equations, every square matrix has a determinant. 

We can write the numerators in the expression for x, and x, as deter- 
minants of the matrices obtained by replacing the first and second columns, 
respectively, of A by the vector b. That is, let 


A,=[b aS] and A, = [aS bl (7) 
Then 

b, a 

det(A,) — % ap — Ay5b, oe a)>b> 
2 22 (8) 
a,, O 

det(A,) —_ ” = a,b —— a,b, 
Ar, by 
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The expressions in (8) are exactly the numerators in (5). So using det(A,) 
and det(A,), our formulas for x, and x, are 


dey) _ det(A,) 
| “det(A) 2 “Get(A) 


The numerators in systems of n equations in n unknowns turn out to have 
the same form as for two equations. That is, if we define A; to be the matrix 


obtained from A by replacing the ith column af by the right-hand-side vector 
b, 


A; = (af, a§,...-, bi ys k 1a BE (9) 


then the solution to Ax = b is 


Cramer’s Rule 


= det(A,) 
‘  det(A)’ 


Applying Cramer’s rule to the system of equations in (4), 


2x — 3y = 4 
x+2y=9 
we obtain 
} —3 
es ee ee ee 


wt, od). SeeeHaanel, Poe 
| 2 
ie 
i en 2x9 — 4x] 14 5 
5 ey SSS ee a ee 
7 3 7 7 
| 2 


the same solution as that we obtained earlier. 
As long as the denominator does not vanish, the determinant formula 
in (10) provides a unique solution to the equation. 


Theorem 1. Let A be an n-by-n matrix and let b be an arbitrary n-vector. 
If det(A) ~ 0, the system of equations Ax = b has a unique solution 
given by Cramer’s rule. 
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Theorem | says nothing about what happens if det(A) = 0. If det(A) 
= (0 but one or more det(A;) # 0, then no solution is possible. However, 
if all det(A;) = O as well as det(A) = 0, then the formulas in (10) become 
0/0—undefined—and solutions may be possible. The following examples 
from Chapter | illustrate this situation. 


eRe Weatso 
Feiss ee hae 
Gt ee fi 


nels RS Bae : 


Mhisast tac ee 


Example 1. Canoe with Sail Revisited 


In Example 4 of Section 1.1 we modified the standard high school 
algebra problem about the speed of a canoe and the speed of the stream 
by placing a sail at the front of the canoe and giving equations for the 
canoe’s speed C and the wind’s speed W when going upwind (into the 
wind) and downwind. 


Upwind: C+ kw=U (11) 
Downwind: C+ W=D 


In Section 1.1 we solved these equations by elimination. Now 
we can solve (11) by using Cramer’s rule, where b = (U, D) and 


a bel 


oe : 
sevay = | Je ter- kta tak 


We calculate 


Ot = kD = U = kD 


Zak 
det(A,) = i 


U 


det(A,) = D 


and so, by Cramer’s rule, 


C = and W = (12) 


In Section 1.1 we tried the value of k = I and found that the 
equations (11) had no solution (for U # D); the formulas in (12) have 
zero denominators. That is, fork = 1, det(A) = 1 — 1 = 0. When 
k = 1, the two rows of A are equal and the two equations in (11) 
represent parallel lines that never intersect. 

We also tried letting k = .75. For values of U = 5, D = 7 the 
system becomes 


Upwind: C+ .75W =5 
Downwind: C + W= 7 
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We obtained the solution C = —1, W = 8, but a negative value 
makes no sense in the real world. The problem is that although 
det(A) = 1 — .75 # 0, det(A) is still close to 0. Informally, the 
system almost has no solution, so the answer is unreliable. In any 
2-by-2 system, when det(A) is much smaller than det(A,) and 
det(A,), the answer should be treated with suspicion. an 


The link between equal rows and det(A) = O mentioned in Example 


1 is true for all square matrices. By a symmetry argument, the result also 
holds for columns. 


Proposition I. If one row (column) of an n-by-n matrix A equals, or is a 


multiple of, another row (column), then det(A) = 0. 


Example 2. Stable Rabbit-Fox 
Populations Revisited 


In Example 3 of Section 1.3 we considered a linear growth model for 
rabbit and fox populations: 


R' R + bR — eF (13) 
F' = F — dF + e’R 


Here R, F are current population sizes and R’, F’ are the sizes 1 month 
later. We set R’ = R and F’ = F to solve for stable (unchanging) 
population sizes, and obtained 


bR — eF = 0 (14) 

e'R — dF = 0) 
Obviously, R = F = 01s a solution to (14), but we want “‘non- 
trivial’? (nonzero) solutions. We found them in Chapter | by ad hoc 


means. Now we can use determinants. Let 


ee 


b(—d) — (-—e)e’ 
e 


A= k i so det(A) = 


—bd + ee’ 


Then by Theorem 1, if det(A) # 0, (14) has a unique solution. Clearly, 
R = F = O is such a solution. For other solutions we must have 
det(A) = 0. Cramer’s rule would now give R = det(A,)/det(A) = 
0/0 (undetermined), and similarly for F [note det(A,) = det(A,) = 0 
since the right side in (14) is 0]. To have det(A) = 0, we require that 


det(A) = —bd + ee’ = 0, or 
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The equality of these ratios is just an algebraic way of saying 
that the first row in (14) must be a multiple of the second row. Such 
a relation between the rows when det(A) = 0 was predicted by Prop- 
osition 1. When e/b = d/e’, one can check that any R, F pair with 
R = (e/b)F = (d/e’)F is a solution to (14). 

In Section 1.3 we considered the system 


AR — .ISF = 0 (15) 
AR — .ISF = 0 
In (15), solutions are of the form R = (.15/.1)F = 3F. 5 


We now consider determinants of a 3-by-3 matrix and more generally 
an n-by-n matrix. Remember that the determinant of a square matrix A is 
defined to be the algebraic expression that appears in the denominator when 
we algebraically solve the matrix equation Ax = b. 

One can show that the determinant of a 3-by-3 matrix A is calculated 
by multiplying the numbers lying on the 6 ‘‘diagonals’’ in the augmented 
3-by-5 array shown below. The products marked by solid lines have plus 
signs and the products marked by dashed lines have minus signs. 


Qi; G2 Gj. ay ay 


tf 


4 4 
— . ue — 
det(A) = ay) Qy AF Af Gy2 = — Ay 1Gy2G33 + 1242343, + A)343\A32 
IE Pe 
4 
G3, G@3r 433 G3, G2 —~ B1349243; ~— Gj) 72491433 — 4) 1423432 


(16) 


Warning: This process does not apply when n > 3. 


Example 3. Solving the Refinery Equations by 
Cramer’s Rule 


In Section 1.2 we discussed a system of equations for controlling the 
production of three refineries: 


20x, + 4x, + 4x; = 500 
10x, + 14x, + 5x; = 850 
Sx, + 5x, + 12x, = 1000 


det(A) = |10 14 §5 20X 14k 12 + 4x5x5 
S  € 19 + 4x10x5 —4x14x§ 

— 4K 10x12 -— 20x5x5 

3360 + 100 + 200 — 280 

— 480 — 500 

= 2400 
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To determine x, using Cramer’s rule, we need det(A,) (recall that A, 
is obtained from A by replacing the first column of A by the numbers 
on the right side of the equations). 


S00 4 


det(A,) = | 850 14 5) = 500K 14x12 + 4x5x 1000 
1000 5 12 + 48505 — 4x 14x 1000 
— 4x850x* 12 — 500x5x5 
= 84,000 + 20,000 + 17,000 
— 56,000 — 40,800 — 12,500 
= 11,700 


We compute x;: 


det(A,) 11,700 7 
ye —— = ——— = 4- 


det(A) 2,400 8 


It is left as an exercise for the reader to determine x, and x;. 5 


Even for 3-by-3 determinants, the calculations are messy. It gets so 
complicated beyond 3-by-3 that one has to resort to a general form of de- 
scription of a determinant (we are lucky that such a description even exists). 


Computational Definition. The determinant of an n-by-n matrix A is 
formed by adding or subtracting all possible products of n entries in- 
volving one entry from each row and each column (there is a technical 
rule of signs for determining whether the product gets a plus or minus 
sign). 


The reader should check that our formulas for 2-by-2 and 3-by-3 de- 
terminants involved all products of this sort. A counting argument shows 
that there are n! [= n(n — 1)(n — 2)-+.- + 3X2X1] such products in an 
n-by-n determinant. For example, a 10-by-10 determinant has 10! = 
3,628,800 products. For this reason, one never solves a large system of 
equations using determinants. 

There is one special class of matrices that arises frequently in theory 
and applications for which the determinant is very easy to compute. A square 


matrix is upper triangular if all entries below the main diagonal are zero, 
such as 


» &@ 4 
A=10 1 7 (17) 
GG 8 2 
A lower triangular matrix has 0’s above the main diagonal. 


Proposition 2. \f A is an upper or a lower triangular matrix, det(A) = 
4);@>. * * * a,,, the product of entries on the main diagonal. 
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Proof. Except for the product of miain-diagonal entries, any other prod- 
uct of n entries, each in a different row and column, will have to 
contain an 0 entry below (or above) the main diagonal, so all such 
other products are 0. "i 


By Proposition 2, det(A) = 2* 1X2 = 4 for the matrix A in (17). 
A special upper (and lower) triangular matrix is the identity matrix I (with 
1’s on the main diagonal and 0’s elsewhere). Then by Proposition 2, 
det(D) = |. 

In Section 3.2 we shall learn how to transform any square matrix A 
into an upper triangular matrix U in a manner that does not change the value 
of the determinant. Using Proposition 2, we will then be able to compute 
det(A) simply by taking the product of the main-diagonal entries of U. 

There is one additional nice property of determinants that we will need 
later in Section 3.2. 


Proposition 3. The determinant of a matrix product is the product of the 
determinants: 


det(AB) = det(A) det(B) 


It was noted at the start of this section that one of the chief reasons 
for studying determinants was their role in finding eigenvalues of a matrix. 


Recall from Section 2.5 that the defining equation for an eigenvalue A and 
its eigenvector wu is 


Au = Au or Au — du — 0 
or 
(A — ADu = 0 (18) 


Given the eigenvalue A, we can determine u by solving the system of 
equations in (18). More important, we can use (18) to determine the eigen- 
values of A. To do this, we recall from Theorem | that if det(A — AI) = 
0, then (18) has only one solution, namely u = 9. Since an eigenvector 
cannot be the zero vector 0, to get eigenvalues we need to choose A so that 
det(A — AD = 0. 


Theorem 2. The values \ that make the det(A — AI) = 0 are eigenvalues 


of A. The associated eigenvector(s) for A are the nonzero solutions to 
(A — ADx = 0. 


To prove Theorem 2 requires vector space theory developed in Chapter 
5. For any matrix A, det(A — AI) will be a polynomial in A, called the 
characteristic polynomial of A. The zeros of the characteristic polynomial 
of A are the eigenvalues of A. Remember that the eigenvector associated 
with an eigenvalue A is actually a family of eigenvectors: If Au = Au, then 
for any r, A(ru) = Aru. 
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CARS ae adie 
Example 4. Determining Eigenvalues 
and Eigenvectors 


Consider the system of computer—dog growth equations from Section 
2.5. 


iy 3C + D or c’ = Ae, where A = ; i 
D' = 2C + 2D 


In Section 2.5 the eigenvalues and eigenvectors were given without 
any explanation of how they were found. Let us calculate them now. 
By Theorem 2 the eigenvalues are the zeros of the characteristic poly- 
nomial det(A — AJ): 


iewa - xn = |? >” aoe = (3 — Al2-A)-— 1-2 
= (6 — 5 + A) — 2 
=4- 5d + d 
= (4 — A)(1 — A) (19) 


So the zeros of det(A — AL) = (4 — A)(1 — A) are 4 and 1. 
To find an eigenvector u for the eigenvalue 4, we must solve the 


system Au = 4u or, by matrix algebra, (A — 41I)u = 0, where 


—— ad 


3 = 4 l — | l 
a-a=|>> ADC) aed 
We find that 


=", + u=0-> u=hu 


2u, a 2u, 0 


The second equation here is just —2 times the first equation (so it is 
superfluous). Then wu is an eigenvector if u,; = uw, or equivalently if 
u is a multiple of [1], 1]. 


It is left as an exercise for the reader to verity that v = [I1, —2] 


is an eigenvector for A = | by showing that this v is a solution to 
Av = vor(A — Dv = 0. : w 


Remar eee N TVA 
Example 5. Eigenvalues and Eigenvectors for 
Rabbit—Fox Population Model 


Consider the rabbit—-fox growth model 


R =R + IR — <1f; 
Fo=F + JR = .1SF 
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Ey ech a Se (20) 
F" i L=[e ne 
which we studied in Section 1.3. Let us find both eigenvalues and 
associated eigenvectors u, Vv, so that we can write a starting population 
vector p in terms of u and v, p = au + bv and use these eigenvectors 
to compute p“, the population vector for k periods. 


We first compute det(A — AI), the characteristic polynomial of 
A. 


or 


det(A — Al) = (hb = ANCES. = A) 


UST dee S219 
I oo = K 


— .6—.45) 
A? — 1.95 + .95 (21) 


By factoring or the quadratic formula, we find the zeros to be A = | 
and A = .95. 
To find an eigenvector u associated with X = 1, we solve 


(AA—Du=0: .lu, -— .l5u,=0 > uy = fu u = [3, 2] 
lu, — .1I5u = 0 (22) 


So nonzero multiples of u = [3, 2] are eigenvectors for \ = 1—that 
is, stable population vectors. 


For completeness, we solve for the eigenvector of A = .95: 
(A — .95I)v: 5v, — .15y, = 0—- v, = v, v = [1, 1] 
ly, — .lvy, = 0 (23) 
So nonzero multiples of v = [1, 1] are eigenvectors of A = .95. 


With the eigenvalues and eigenvectors, we can now explain the 
behavior of this model that we observed in Section 1.3. In doing so, 
we illustrate the basic role of eigenvalues and eigenvectors in describ- 
ing the long-term behavior of dynamic systems. 

In Section 1.3 we started with the [R, F] pair = [50, 40] and 
followed our model [equations (20)] for many periods: 


0 months: 50 rabbits, 40 foxes 
| month: 49 rabbits, 39 foxes 
2 months: 48 rabbits, 38 foxes 
3 months: 47 rabbits, 37 foxes 


10 months: 42 rabbits, 32 foxes (24) 
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20 months: 37 rabbits, 27 foxes 
50 months: 31.5 rabbits, 21.5 foxes 


100 months: 30.1 rabbits, 20.1 foxes 


Let us express the starting vector p = [50, 40] in terms of the 
eigenvectors u = [3, 2] and v = [1, 1]: p = au + bv: 


ES eh : 3a + b = 50 (25) 
sat “ha '  Ma+b=40 
By Cramer’s rule, we find thata = 10 and b = 20. Thus 
p = 10u + 20v (26) 


Then using (26) to compute the population sizes in (24) gives 


p = Ap = 10Au + 20Av = 10(1u) + 20(.95y) 
10[3, 2] + 19[1, 1] = [49, 39] 


and 


p” = A*‘p = 10A‘u + 20A*v = 10(1*u) + 20(.95*v) 
10[3, 2] + 20x .95*[1, 1] 
= (30, 20] + .95*[20, 20] (27) 


The second term .95*[20, 20] in the last line of (26) slowly goes 
to 0, leaving the stable population vector [30, 20]. With (27), the 
behavior in table (24) is completely explained! 

If we generalize the calculation in (27) and the starting vector p 
has the eigenvector representation p = au + by, then 


p” = A*p —_ aA‘u +4 bA*y = qu + bx .95*y (28) 
= [3a, 2a] + .95*[b, b] 


So the long-term stable population is [3a, 2a]. The critical num- 
ber is a. To find a for the general starting vector p = [R, F], we 
substitute [R, F'] for [50, 40] in (25) and apply Cramer’s rule. 


R j 
tetkey FO Tp Sapa De 
det(A) 
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Section 3. I Exercises 


Summary of Exercises 

Exercises 1-21 involve properties of determinants and their use in solving 
systems of equations, with Exercises 13-21] involving theory. Exercises 
22-28 involve using determinants to find eigenvalues. Exercises 29-34 
present theoretical properties of the characteristic polynomial, including the 
Cayley—Hamilton theorem. Exercises 35 and 36 introduce the euclidean 
norm of a matrix. 


kL, 


Compute the determinant of the following matrices. 


bp ak $2 
« b a si pes 8 es F ’ 


. Find the (unique) solution to the following systems of equations, if 


possible, using Cramer’s Rule. 
(a) x + y = 34 (hb) 2x — 3y 
2x — y = 30 —4x + 6y 


5 (c) 3x + y=T 
10 ex — 2y = 7 


Consider the two-refinery production of diesel oil and gasoline. The 
second refinery has not been built, but when it is built it will produce 
twice as much gas as diesel oil from each barrel of crude oil. We have 


D 
G 


Diesel oil: 10x, + ax, 
Gasoline: § 5x, + 2ax, 


where a is to be determined, D is demand for diesel oil, and G is demand 
for gasoline (and x; is number of barrels of crude oil processed by 
refinery i, i = 1, 2). Solve this system of equations to determine x, 
and x, in terms of a, D, G using Cramer’s rule. 


. Consider the two-refinery production of diesel oi] and gasoline. The 


second refinery has not been built but when it is built it will produce 
15 gallons of gasoline and k gallons of diesel oil from each barrel of 
crude oil. We have 


Diesel oil: 10x, + kx, 
Gasoline: 3x, + 15x, 


D 
G 


where k is to be determined, D is demand for diesel oil, and G is demand 
for gasoline (and x; is number of barrels of crude oil processed by 
refinery i, i = 1, 2). Solve this system of equations to determine x, 
and x, in terms of k, D, G using Cramer’s rule. What value of k yields 
a nonunique solution? In practical terms, what does this nonuniqueness 
mean? 


. Which of the following systems of equations have nonzero solutions? 


If the solution is not unique, give the set of all possible solutions. 
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(a) 3x + 4y = 0 (b) 4x —- y = 0 (c) 2x — 6y 
6x + 2y = 0 4x -—y 


0 
0 —x + 3y =0 


. When the right-hand side is nonzero and the determinant is 0, there 


may be no solution to the system of equations. Which of the following 
systems of equations have no solution? 


(a) 3x + 2y = 2 (b) 2x — 3y = 2 (c) 2x -6by= 4 
6x + 4y = 2 2x - 3y = 2 —x + 3y = -—2 


. Compute the determinant of the following matrices. 


t 2. 3 2 0 —|] Oo 2 °@ 
igual 2 4 (b) | 0 0 3 (eq. 1f 2 4g 
a0 by 2 0 —-] Ze 
Bade. Pek oy a b 4 
(d) |}O 2 2 (e) |}O O 2 (ff) |}7 8 9 
(2. 3 ae a 4 4 


- In which matrices in Exercise 7 is one row or column a multiple of 


another (so that by Proposition 1, the determinant will be 0)? 


. Use Cramer’s rule to solve for x, and x; in Example 3. 
Solve the following systems of equations using Cramer’s rule. 
(a) 2x — y+2z=4 (b) xt y+2=3 
Xx + 3z=6 2x +3y+2z=9 
wWwn~- 2=] —“ye yom = =4 
(cj) =—2 FSi — z= 4 
Sey = 6 
x + 2'= 3 


Consider the following system of equations for the growth of rabbits 
(R), foxes (F), and humans (#7). 


R= K+ 33K = AF = 2m 
F' =F + AR — .2F — 3H 
H=H ARF OM + 1 


We want to see if stable population sizes are possible (when R' = R, 
F' = F, H' = BA). Set up the stable population system of equations 
[similar to (15) in Example 2] and compute the determinant to see if a 
nonzero solution is possible (do not try to find a stable solution). 


12. Repeat Exercise 11 with the system 
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13. 


14. 


15. 


16. 


17. 


18. 


19. 


R’ = R + .3R — 2F — 220 
Fo =F + 4R — .2F — AH 
H =H + .2R + .1F + 1H 


If you double the first row in the system 
ax + by =e 
cx + dy = f 


show using Cramer’s rule that the solution does not change. 


If you double the first column in the system 
ax + by =e 
cx + dy = f 


show using Cramer’s rule that the value of x is half as large and the 
value of y is unchanged. 


(a) If you interchange the rows of a 2-by-2 matrix A, show that the 
determinant of the new matrix is — 1 times the det(A). 
Hint: Use (6). The same is true for interchanging columns. 

(b) If you interchange the first two rows of a 3-by-3 matrix A, show 
that the determinant, of the new matrix is — 1 times the det(A). 
Hint: Use (16). 


From the computational definition for a determinant, deduce that for 
any square matrix A, its transpose A’ (obtained by interchanging rows 
and columns) has the same determinant as A. (Thus A’x = b has a 
unique solution if and only if Ax = b does.) 


From the computational definition for a determinant, deduce that any 
square matrix A with a row (or column) of all 0’s has det(A) = 0. | 


(a) From the computational definition for a determinant, deduce that 
if B is a square 3-by-3 matrix obtained from A by doubling every 
entry in the second row of matrix A, then det(B) = 2 - det(A). 

(b) More generally, if every entry in a row (or column) of an n-by-n 
matrix A is multiplied by a constant k, the determinant of the 
resulting matrix equals k » det (A). 


Compute the determinant of the following matrices. 


2378 0000 L200 6 

0391 20.0 0 00300 

Mio o 1 5| 14500] @Wlo 001 0 
0004 44) $0 00002 

20002 
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Let A and B be arbitrary 2-by-2 matrices. Using (6), show that 
det(AB) = det(A) det(B). 


In the following figure, the area of the triangle ABC can be expressed 
as 


area ABC = area ABB'A’ + area BCC'B' (*) 
— area ACC'A’' 
(¥>,¥2) 
A 
(X¥),¥,) 
| | (x3,¥3) 
| rags 
| | 
| | | 
ae ee a 
A’ B' C 


Using (*) and the fact that the area of a trapezoid is one-half of the 
distance between the parallel sides times the sum of the lengths of the 
parallel sides, show that 


| ee, ae 
area ABC = D X>5 Yo l 
xX, y3 | 


Determine an eigenvector associated with A = | in Example 4. 


(a) Compute the eigenvalues of each of the following matrices. 


Py ea Gy {' ase 
Mla 9 ed) Cee) ee tal Oe 
ee oe 
(iv) } 4 (Vv) 3 =-l <3 
a, 


(b) Determine an eigenvector associated with the largest eigenvalue, 
using the method in Example 4, for the matrices in part (a). 


. (a) For the following rabbit-fox models, determine both eigenvalues. 


iy R =R+> .K = 33F Ck = Rut sR. = PF 
Poa 2K = OF Fo= F + ASR = 


(b) Determine an eigenvector u associated with the largest eigenvalue 
in each system in part (a). 

(c) Determine the other eigenvector v (associated with the smaller ei- 
genvalue) for each system in part (a). 
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ZS. 


26. 


Zi. 


28. 


(d) If the initial population is x = [10, 10], express x as a linear 
combination of u and v, as in equation (27), for each system in 
part (a). Use this expression to describe in words the behavior of 
this model over time. 


The following system of equations was the first rabbit-fox model ana- 
lyzed in Section 1.3. 


Ro =R + 2h = .3F 
Fa F + R= dF 


Determine the dominant eigenvalue and an associated eigenvector. 


(a) For the following rabbit-fox models, determine both eigenvalues. 
(i) R’ =R+ AR + IF (ii) R’ = R + 2R — 3F 


F' =F + .2R + .1F F’ = F + 1.5R — 4.5F 


(b) Determine an eigenvector u associated with the largest eigenvalue 
in each system in part (a). 

(c) Determine the other eigenvector v (associated with the smaller 
eigenvalue) for each system in part (a). 

(d) If the initial population is x = [10, 10] express x as a linear 
combination of u and v as in equation (27), for each system in part 
(a). Use this expression to describe in words the behavior of this 
model over time. | 


The following growth model for elephants (E) and mice (M) predicts 
population changes from decade to decade. 


EF’ = 3E+ M 
M' = 2E + 4M 


(a) Determine the eigenvalues and associated eigenvectors for this sys- 
tem. 

(b) Suppose initially that we have p = [E, M] = [5, 5]. Write p as 
a linear combination of the eigenvectors. 

(c) Use the information in part (b) to determine an approximate value 
‘for the population sizes in eight decades. 


The following growth model for computer science teachers (7) and 
programmers (P) predicts population changes from decade to decade. 


r=T- P 
P' = 2T + 4P 


(a) Determine the eigenvalues and associated eigenvectors for this sys- 
tem. 

(b) Suppose initially that we have p = [7, P] = [10, 100]. Write p 
as a linear combination of the eigenvectors. 
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(c) Use the information in part (b) to determine an approximate value 
for the population sizes in 12 decades. 


Verify that the constant term in a characteristic polynomial is det(A). 


Let A be a 2-by-2 matrix with all positive entries and det(A) # 0. Show 
that the eigenvalues of A must be positive real numbers. 


Show that the product of the eigenvalues of a 2-by-2 matrix A equals 
det(A). 


Hint: The product of the eigenvalues is the constant term in the char- 
acteristic polynomial det(A — AI). Note that this result is true for 
matrices of any size. 


Show that the sum of the eigenvalues of a 2-by-2 matrix A equals the 
sum of the main-diagonal entries of A. 


Hint: These quantities are both the coefficient of A in the characteristic 
polynomial det(A — AI). Note that this result is true for matrices of 
any size. 


q 
0 b 
(b) Generalize the result in part (a) to show that in any upper triangular 

matrix, the eigenvalues are just the entries on the main diagonal. 


(a) For a matrix A = , Show that eigenvalues are a and b. 


Hint: Recall that the determinant of such a matrix is simply the 
product of the main-diagonal entries. 


. This exercise illustrates a famous result in linear algebra known as the 


Cayley—Hamilton theorem, which says that a square matrix A satisfies 
its characteristic equation, det(A — AI) = 0. 


(a) Let A = f 4! So det(A — AD = (2 — A\(2 — A) —- 
1-1 = dA? — 4d + 3. The characteristic equation of A is then 
\? — 4X + 3 = O. Verify that A satisfies its characteristic equation 
by setting A = A and showing that A* — 4A + 3I = O. 

(b) The characteristic equation can be factored to (A — 3)(A — 1) = 
0. Check that (A — 3I1)(A — I = O. 

(c) Following the same steps as in part (a), check that the matrix 


oe 
A = i 4 for the computer—dog model satisfies its character- 


istic equation. 


The euclidean norm |IAl|, of A satisfies |Ax|, < ||All. - |x|,, where | |, 
denotes the euclidean distance norm of a vector. If A is a symmetric 
matrix, it can be proved that ||A||, equals the largest eigenvalue of A (in 
absolute value). Compute the euclidean norm of the following matrices 
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and compare this value with the sum norm and max norm of these 
matrices. . 


Pf eine 0 3 
a & - | 2 mt | 4 


36. The euclidean norm (see Exercise 35) of a nonsymmetric matrix A is 
equal to the square root of the largest eigenvalue (in absolute value) of 
the symmetric matrix A’7A (where A’ is the transpose of A). Compute 
the euclidean norm of the following matrices and compare this value 
with the sum norm and max norm of these matrices. 


C2 a 1 =f. J 0 4 
of] of] oft] #4 


Solving Systems of Equations 
by Elimination 


In this section we develop the general procedure of elimination for solving 
any system of m equations in n unknowns—to find the unique solution, if 
one exists, or to show that no unique solution exists. Elimination was devised 
by Karl Friedrich Gauss around 1820 to solve systems of linear equations 
that arose while solving a regression model (such as the one introduced in 
Section 1.4) to estimate locations in survey mapping. The method of elimi- 
nation was used in the beginning of Section 3.1 to find a general solution 
to a system of two equations in two unknowns. 

The solution by elimination involves two stages. The first is to trans- 
form the given system (as far as possible) into an upper triangular system 


such as 
Xx, + 4x4, = 5 
x, = 2 


The second stage is to use back substitution to obtain values for the un- 
knowns. 


R= 2—>x%+ 42) =5 oO % = —-3; and 
Xe == 3, % = 2 hh 2X Ea = 3) =- 1 or 6x, = 3 
The solution vector is thus x = [3, —3, 2]. 


The elimination transformations in the first stage are based on the 
following two simple properties of equations: 


1. If we multiply both sides of an equation by a constant, this does not 
‘affect the possible solutions to the equation. 
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2. If we add two equations together (add the left sides together and add the 
right sides together), any solution to both of the original equations is also 
a solution to the combined equation. 


Combining these two properties repeatedly, we construct a new set of 
easily solved equations whose solution will be a solution to the original 
system of equations. 


Principle of Gaussian Elimination. Subtract multiples of the ith 


equation to eliminate the ith variable from the remaining equations, 
fort at D2 ce . =k 


The best way to show how Gaussian elimination works is with some 
examples. Then we state the procedure in algebraic terms. 


Example I. Gaussian Elimination Example 


We start with a very simple system of two equations in two unknowns. 


(a) x+y= 4 
(b) 2X = y= =] 


To eliminate the 2x term from (b), we subtract 2 times (a) from (b), 
and obtain the following new second equation: 


(b) xn yur =] 
— 2(a) —- (2x +2y= 8) 
(b’) = (b) — 2(a) 0 - 3y = -9 


Our new system of equations is 


(a) Fee y= 4 
(b’) -—3y = ~—9 
By properties | and 2, any solution to (a) and (b) is also a solution 
to (a) and (b’). Further, we can reverse the step creating (b’). That is, 
(b‘) = (b) — 2(a) implies that (b) = (b’) + 2(a). Thus (b) is formed 
from (b’) and a multiple of (a), so any solution to (a) and (b’) is a 
solution to (a) and (b). 
But (b’) is trivial to solve, and gives 


= 3 
Substituting y = 3 in (a), we have 


x+3=4—7>x=4-3=1 
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The reader should check that x = 1, y = 3 is a solution to (a) 
and (b). a 


Example 2. Gaussian Elimination for 
Refinery Problem 
Recall the refinery problem introduced in Section |.2 with three refin- 


eries whose production levels had to be chosen to meet the demands 
for heating oil, diesel oil, and gasoline. 


Heating oil: (a) 20x, + 4x, + 4x; = 500 
_ Diesel oil: = (b) lOx, + 14%, + 5x, = 850 (1) 
Gasoline: (c) 3x, + 5%5.+ 12x, = 1000 


Use multiples of equation (a) to eliminate x, from (b) and (c). First, 
subtract 4 times (a) from (b) to eliminate the 10x, term from (b) and 
obtain a new second equation (b’). 


(b) 10x, + 14x, + 5x; = 850 
— $(a) — (10x, + 2x5 + 2x, = 250) 
(b’) = (b) — 4a) QO + 12x, + 3x; = 600 


In a similar fashion we subtract 4 times (a) from (c) to eliminate 
the Sx, term from (c) and obtain a new equation (c’): 


1000 


(c) 5X; + 35x, + 12x, = 
_ =) ss = ey + ee = 15) 
(c’) = (c) — a) 4x, + 1lx,; = 875 
Our new system of equations is now 
(a) 20x, + 4x, + 4x, = 500 
(b’) 12x, + 3x, = 600 (2) 
(c’) 4x, + Ilx, = 875 


Next we use equation (b’) to eliminate the 4x, term from (c’) 
and obtain a new third equation (c”). 


(c’) 4x, + llx, = 875 
—4(b’) — (4x, + x; = 200) 
(c”) = (c’) — 3(b’) 10x; = 675 


Our new system of equations is 


(a) 20x, + 4x, + 4x, = 500 
(b’) 12x, + 3x, = 600 (3) 
(c’) 10x; = 675 
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By properties | and 2, any solution to the original system (1) is 
a solution to the new system (3). Furthermore, by reversing the steps 
in going from (1) to (3) [so that (1) is formed from linear combinations 


of the equations in (3)], we also have that any solution to (3) is a 
solution to (1). 


Now (3) is in upper triangular form and we can solve using back 
substitution. From (c”) we have 


xX, = WS = 678 
and giving this value for x, in (b’), we have 
12x, + 3(672) = 600 
or 
12x, = 600 — 2023 > x, = 33% 
and substituting these values for x, and x, in (a), we have 
20x, + 4(33%) + 4(673) = 500 


or 


_ _ 500 — 4025 _ 


4 
20 é 


So the vector of production levels of the three respective refi- 
neries is (4%, 33%, 674). Recall that in Section 1.2, by trial and error 
we had obtained the estimated solution vector (5, 33, 68)—a pretty 
good guess. te 


Example 3. System of Equations Without 
Unique Solution 


Suppose that we change the third equation in Example 2 so that our 
system is now 

(a) 20x, + 4%, + 44, = 500 

(b) 10x, + 14x, + 5x, = 850 (4) 


After eliminating x, from (b) and (c) as above, we have 
(a) 20x, + 4x, + 44, = 500 


(b’) 12x, + 3x, = 600 (5) 
(c’) —12x, — 3x, 


: 
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Next we add (b’) to (c’) to eliminate the — 12x, term, but this elimi- 
nates all of (c’). | 


(c") = (c’) + (b’) QO = 0 


That is, equation (c’) is just minus (b’). We have only two equations 
in three unknowns. This system has an infinite number of solutions, 
since we can pick any value for x, and then knowing x; we can de- 
termine x, from (b’) and then x, from (a). 

Let us reconsider (4) with the third equation replaced by 


(Cc) 10x, _ 10x, a X3 _ 300 
then (c’) would have been 
(c’) —12x, — 3x, = 50 


Now when we use (b’) to eliminate the — 12x, term in (c), we get 


(c’) = (c’) — (b’) 0 = 650 


That is, (b’) and (c’) are inconsistent equations, and this new system 
has no solution. 

The reader should check that the coefficient matrix in (4) has 
determinant 0. The reason is that the first row minus the second row 
equals the third row. = 


Suppose that we have a system of n equations in n unknowns 


A) \X; + A,7X> Tes 9 1 ,Xp = b, 

Cy \X; + 55X> 5 + a5,X, — b, 
(6) 

Gnky F Aigky FOS * & Ay xX, = DB; 

Ay \X\ + a,,2X2 a a 3 AnnX, = b,, 
Since the first equation begins a,,x, + ~~~ and the 7th equation begins 
a,x, + +++, then multiplying the first equation by a;,/a,, will yield a new 


equation that begins a;,x, +--+: , that is, 


a: 
es (45%) “F Agks Hs + Gy,%, = Bj) 
11 
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equals 


aij ai) ai 
Aj, X i 5 A19X> Too + — A1,% =— b, (7) 


mn 
11 ai; ay 


Subtracting (7) from the ith equation in (6) yields 


a; a; a; 
(0, Sa 20a.) ele Se (a, a 2H a4), = 6, - —“ b, (8) 


11 


By performing the steps in (7) and (8) fori = 2,3,.. . ,n, we can eliminate 
the x, term from every equation except the first, so that now, the second 
through nth equations will form a system of n — 1 equations inn — | 
unknowns. We repeat the elimination process with this n—1-by-n—1 sys- 
tem, eliminating the x, term from the third through nth equations. We con- 
tinue this method of eliminating variables until we finally have one equation 
in x,—which is trivial to solve. 

Once x,, is known, we can work backwards to determine the value of 
x, —,, then of x,,_,, and so on, as in the previous examples. We are assuming 
here that when it is time to eliminate x; from equations 7 + 1 through n, 
the coefficient of x; in the current jth equation is nonzero; otherwise, we 
cannot use the jth equation to eliminate x; from other equations. We discuss 
the case where this coefficient is zero shortly. 

' Since Gaussian elimination involves only coefficients, the variables are 
just excess baggage. Thus, after stating a problem in equation form, we can 
perform the elimination algorithm on the coefficient matrix augmented with 
the right-side vector. 

Let us try out Gaussian elimination in this format on a familiar larger 
system, of four equations in four unknowns. 


Example 4. Solving Leontief 
Supply—Demand Equations 


Use Gaussian elimination to solve the supply—demand equations intro- 
duced in Section 1.2 for a sample Leontief economic model. 


Industrial Demands 


Consumer 
Supply Energy Constr. Transp. Steel Demand 
Energy: x,= .4x, + .2x, + .2x, + .2x, + 100 
Cosstricts: X52 338, [Fo 23k: 4) Gare. WED FT 50 
ae (9) 
Transport.: x,3= .Ilx, + .1lx, + + aXe 100 
Steel: x, = + ks. FE abe 


Bringing the x,’s over to the left side of the equations, we have the 
system 
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(a) OX, — 2% — .2Xq — .2e, = 100 
(b) = Oky 51X22 Ly, = SO (10) 
(Cc) =.1%, = 1% 4+ xy —..2e, = 100 
(d) = 1 = UL eS OD 


Changing notation to the augmented coefficient matrix yields 


(a) 6 =2 =2 ~)2 | 100 

b 2: Need ey ae 

(b) f 88 eS 
(c) =| =) &  =.2 (100 

(d) oh ee 0 


First we use multiples of equation (a) to eliminate the x, term from 
equations (b), (c), and (d), that is, to make entries (2, 1), (3, 1), and 


(4, 1) zero. 

(a) 6 —2 22 —2 | 100 

(b') = (b) + (a) 0 6 — 3 —.2 100 

(c') = (c) + Xa) | 0 —.133 .967 —.233 | 116.67 

(d) 0 -.]1 —.] | 0 

(11) 

Next we make entries (3, 2) and (4, 2) zero. 
(a) 4G = = 2 2 1 100 
(b’) 0 6 —.3 —.2 100 (12) 
(c”) = (c') + $b’) () (0) 9 = 278 | 138.86 
(a’) = (d’) + Hb’) 10 0 =AS5 967 16.67 
Finally, we make entry (4, 3) zero. 
(a) 6 —2 =2 —.2 | 100 
(b’) 0 6 —-3 -—.2 100 (13) 
(c”) 0 0 9 —.278 | 138.86 
(d") = (d") + ac") 10 #0 0 920 39.81 


System (13) is an upper triangular system that is equivalent to (10), in 
the sense that both systems have the same solutions. In equation form, 


(13) is 
(a) 6x, a ke — 2X3 _ 2X4 —_ 100 
(b’”) 6X, = 3X3 = 2X4 = 100 (13’) 
("") 9x, — .278x, = 138.86 


(d"”) 920x, = 39.81 
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Using back substitution, we obtain 
x, = 325.3, x, = 264.9, x, = 167.7, x, = 43.3 


As expected, these numbers are close to the estimated answer we 
obtained by iterated trial and error in Section 1.2. ra) 


When we have another right-side b* for which the system Ax = b 
must be solved, it is natural to save some of the information used from the 
solution of Ax = b, since the operations performed in elimination depend 
only on the left-side coefficients, not on the right-side vector. For example, 
for a new set of consumer demands in the Leontief model above, all that 
would change in (13) would be the numbers on the right side. 

There are two natural sets of information to save. First is the final 
reduced set of the coefficients [e.g., the coefficients in (13)]; there is no 
need to compute these numbers again. Second ts the collection of multipliers 
used to subtract the jth equation from the ith equation (i > /), since we 
must perform these subtractions on the new right numbers. 

Let U denote the upper triangular matrix of coefficients in the final 
reduced system, and let L be the matrix of multipliers /;, telling how many 
times equation j is subtracted from equation 7. For reasons to be explained 
shortly, we set /;, = 1. 


RO Aa Sed 
Example 5. L and U Matrices for Re-solving 
Refinery Equations 


Consider the system of equations from Example 2: 
(a) 20x, + 4x, + 4x, = 500 


(b) 10x, + 14x, + 5x, = 850 (14) 
(c) 5x, + Sx, + 12x; = 1000 


whose final reduced system was 


(a) 20x, + 4% + 4x, = 500 
(b’) + 12x, + 3x, = 600 (15) 
(c") 10x, = 675 
The reduced system matrix U is 
20 4 4 
Ded Of i 3 (16) 
Tn, i 


Recall that in Example 2 we eliminated x, from equations (b) and (c) 
in (16) by subtracting 4 times (a) from (b) and 4 times (a) from (c). 
Thus /,, = 4 and/,, = 4. Next we eliminated x, from the last equation 
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by subtracting 4 times (b’) from (c’). Thus /,, = 4. Putting 1’s on the 
main diagonal, we have 


b oh 8 
L=]/4 1 0 (17) 
a 


To solve (14) for another right-side vector b*, simply perform 
the elimination steps on b* specified by L to get the final right-side 
vector b** and then solve the reduced system Ux = b** by back 
substitution. 

For example, suppose that b* = [400, 500, 600]. Then repeating 
the elimination steps (using L) on the new right sides, we have 


(a) = 400 
(b) = 500 
(c) = 600 
(a) = 400 
(b’) = (b) — 2a) = 300 
(c’) = (c) — Ha) = 500 
(a) = 400 
(b’) | = 0 
(c”) = (c') — 9(b’) = 400 


The new reduced system is 


(a) 20x, + 4x, + 4x, = 400 
(b’) 12x, + 3x, = 300 (18) 
(c”) 10x; = 400 


Using back substitution, we find 


Then 


— 300 — 3(40) _ 


15 
12 


X4 


and finally 


400 — 4(15) — 4(40) _ 180 
20 6 


x, = 


So the new solution is [9, 15, 40]. a 
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The reader can check that for the Leontief system in Example 4, the 
matrices U and L are 


6 -—-2 -2 =-2 | 0 0 0 
0 6. = 3 =2 —t l 0 0 
= L = 19 
: Y 0 g =.278)’ —it -2 l 0 7) 
ih) 0 920 QO -% -# 1 


Note that ignoring the 1’s on L’s main diagonal, the data in L and U 
can be stored together in one square matrix. 
Now we state a remarkable theorem. 


Theorem I. Given any n-by-n A, let the matrices L, U be as defined 
above. Then | 


A = LU (20) 


Note that we are assuming that A’s rows are arranged so that no 0’s occur 
on the main diagonal during elimination. Theorem | is proved at the end of 
Section 5.2. 

Let us check (20) for L and U in Example 5. We want to multiply: 


1 0 O|[f20 4 4 
LU=|% 1 Oj] O 12 3 (21) 
; = Hb Oo O.om 


Let us compute LU by the definition of matrix multiplication, which 
says that the ith row in LU equals &U (where If is the ith row of L). So 
the first row of the product LU in (21) is 


oa oa 
KU =[1 0 O]] O 12 3] = [20 4 4] (22a) 
0 0 10 


The second row is 


20... 4 
EU = 1 O}} O 12 3)=320 4 474+ 110 22 3] 22d) 
0 O 10}; = [10 14 5] 


The third row is 


Fu =[4 4 13] O 12 3) =}20 4 44+%0 12 3] 
0 oO 10 + 1[0 0 10] 
=[5 5 12] . (22c) 
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Putting the three rows of LU computed in (22a), (22b), and (22c) together, 
we have A. 

The proof of this theorem is a generalization of the computation done 
in (22a), (22b), and (22c). In the elimination process, we are forming new 
equations as linear combinations of the original equations. Conversely, the 
Original equations are linear combinations of the final reduced equations. 
The latter property is exactly what the computations in (22a), (22b), and 
(22c) illustrate. For example, (22b) shows that af, the second row of A, is 
the following linear combination of U’s rows: af = 5u% + uf. 

The LU decomposition of a square matrix has many important uses. 
It also yields a simple formula for the determinant of a square matrix and 
also allows us to prove that elimination always finds a solution to Ax = b 

if one exists. 


Theorem 2. For any square matrix A, 
det(A) = uy, * Wan * * * Upp (23) 


That is, det(A) equals the product of main diagonal entries in U, where 
U is the reduced-system matrix in the decomposition A = LU. 


Proof. (i) Since A = LU, then det(A) = det(L) - det(U), by the 
product rule for determinants (Proposition 3 of Section 3.1). 

(ii) det(L) = 1, and det(U) = product of U’s main diagonal 
entries, since the determinant of a lower or upper triangular matrix 
(like L or U) is just the product of the main diagonal entries (Propo- 
sition 2 of Section 3.1). 

Combining parts (i) and (11), we have formula (23). a 


Implicit in this theorem is the fact that if one (or more) of the main 
diagonal entries in U is 0, then det(A) = 0 and Ax = b does not have a 
solution or the solution is nonunique. (Remember that we are assuming that 
the rows of A are arranged to avoid 0’s on the main diagonal during elimi- 
nation, unless a whole row of 0’s occurs.) When this happens, the elimi- 
nation process fails, as happened in Example 3. Conversely, if det(A) # 0, 
the elimination cannot fail. Thus we have proven 


Theorem 3. For any n-by-n matrix A and any n-vector b, Gaussian elimi- 
nation finds the unique solution to Ax = b if such a unique solution 
exists. 


We close this section by presenting a variation on Gaussian elimination 
that is a little slower but eliminates the need to do back substitution. This 
method is known as Gauss—Jordan elimination, but in this book we shall 
call it the method of elimination by pivoting. Pivoting yields a convenient 
way to calculate the inverse of a matrix (in Section 3.3) and arises in the 
solution of linear programs. 

Elimination by pivoting uses the equation i to eliminate x, from all 
other equations before, as well as after, equation i (Gaussian elimination 
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only eliminates x; in equations after equation /). It also divides equation i 
by a,;, so that the coefficient of x; in the new equation / is |. 

We use the term pivot on entry a;; (the coefficient of x, in equation i) 
to denote the process of using equation i to eliminate x; from all other 
equations (and make | be the new coefficient of x; in equation i). 


Example 6. Elimination by Pivoting 
Let us rework Example 2 using elimination by pivoting. 
(a) 20x, + 4x, + 4x, = 500 


(b) 10x, + 14%, +. 5x, 850 (24) 
(c) 5x, + Sx, + 12x, = 1000 


We begin by expressing (24) in terms of an augmented coefficient 
matrix. 


(a) 20 4 4 ~~ 500 
(b) 10 14 5] 850 (24') 
(c) eo te GD 


Now we make entry (1, 1) one and the rest of the first column zeros. 


(a’) = (a)/20 b. ‘&. [8425 
(b’) = (b) — 10(a’) 0 12 31] 600 (25) 
(c') = (c) — 5(a’) Go «ateat® 1875 


Next we make entry (2, 2) one and the rest of the second column zeros. 


(a") = (a’) — 3(b") [1 0 # 15 
(b") = (b')/12 Gk 2°50 (26) 
(c”) = (c') — 4(b”) [0 O 10 675 


Finally, we make entry (3, 3) one and the rest of the third column 
zeros. 


(a") = (a') — so(c”) [1 0 O 
(b”) = (b") — 4c”) 10 1 O| 33% (27) 
(c”) = (c")/10 9 0 3 


Note that the upper triangular system of equations corresponding to 
(27) yields a solution directly, without back substitution. 


3 = 4§ 
x = 33% (27') 
x5.= 6718 Bg 


yoryrs 
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Actually, in elimination, one can use any equation to eliminate x; from 
the other equations. We illustrate the idea with elimination by pivoting, but 
it also applies to Gaussian elimination. 


aes eats se Br 
Example 7. Solution with Off-Diagonal Pivoting 


Let us repeat Example 6 but with the numbers in equations (b) and (c) 


changed: 
(a) 20 4 4 500 
(b) 10 2 5|]| 850 (28) 
(c) a -F Bs S25 
We want to make entry (1, 1) one and the rest of the first column 
zeros. 
(a’) = (a)/20 [S92 (25 
(b’) = (b) — 10(a’) 0 0 3 | 600 (29) 
(c’) = (c) — 5(a’) 0 4 8 400 


Since entry (2, 2) is zero, we cannot pivot on it. Furthermore, 
we cannot pivot on entry (1, 2) since we have already pivoted on an 
entry in the first row. Thus we must pivot on entry (3, 2) and make it 
one while the rest of the second column becomes zeros. 


(a") = (a’) — 3(c”) a 
(b”) = (b’) 0 0 £3 600 (30) 
(c”) = (c’)/4 - Y 2 100 
Finally, we pivot on entry (2, 3). 
(a’””) = (a") + §(b") b8 -O 45 
(b”) = (b")/3 0 0 1] .200 (31) 


(o") = ("): = 20") oO oe 


We read off the solution, x, = 45, x, = —300, x, = 200. a 


We close with two important comments about elimination. The first is 
how to handle the problem of entry (i, i) being 0 when we want to use it to 
eliminate x; from the following equations—this occurred in (29). The solu- 
tion is to pivot on another nonzero entry, say entry (A, i), in the ith column; 
in (29), we pivoted on entry (3, 2). An equivalent step is to interchange the 
ith equation with the Ath equation; after the interchange, entry (i, 7) is 
nonzero. In (29) we would interchange equations (b’) and (c’). Such an 
interchange works for Gaussian elimination as well as elimination by pivot- 
ing. 
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Second, note that one cannot pivot twice in the same row. For example, 
in system (30), if we pivoted on entry (3, 3), then when we used the third 
equation to eliminate x, terms from other equations, we would be reintro- 
ducing x, terms into the other equations (see Exercise 21). 

The LU decomposition exemplifies a very important aspect of com- 
puter science. In the LU decomposition, we transfer much of the work in 
solving the system Ax = b into a ‘‘data structure’ problem. We ‘‘store’’ 
A in the decomposed form of a lower and an upper triangular matrix, L and 
U. Computer science examines how one processes complex sets of infor- 
mation (in Europe, the subject is often called informatics). How data are 
organized (or preprocessed) into data structures is often more important in 
information processing than the subsequent computations. 

The LU decomposition is also a matrix algebra example of the com- 
puter science insight—that computer programs can be viewed as a special 
form of data. That is, programs are stored as a string of 0’s and 1’s just like 
other data (before programs were stored as data, computers had to be rewired 
for each new set of computations). The matrix L contains the elimination 
multipliers, part of the Gaussian elimination ‘‘program,’’ which are used to 
reduce any right-side vector b to b* after which back substitution solves 
Ux = b*. Premultiplying the reduced-form matrix U by L to obtain A is 
another instance where L acts like a program—the ith row of L times U 
reconstructs the ith row of A as a linear combination of U’s rows: 


af — kU = i ut + lu Se os i ak 


Matrix multiplication makes it possible to use matrices as both data 
and programs, just like a computer. By pre- and postmultiplying data matrix 
by the proper “*program’’ matrices, one can do almost anything. At the core 
of such computations is having the right data representation or data structure, 
be it a matrix decomposition or some other transformed form of the matrix. 
The key stage in virtually all modern numerical algorithms involving matri- 
ces is the preprocessing, to get the right representation of the data. 


Section 3.2 Exercises 


Summary of Exercises 
Exercises 1-16 involve Gaussian elimination computations. Exercises 17—20 
involve word problems. Exercises 21-25 are theoretical. 


1. Solve the following systems of equations by Gaussian elimination. 
(a) x + y=5 (b) 2x — 3y = 4 (c) 3x -—-y=0 
x-2=4 3x + 2y = 5 —2x + y = 2 
2. In each of the following sets of three equations, show that the third 


equation equals the second equation minus some multiple of the first 
equation: (c) = (b) — r(a) for some r. 
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(i) (a2) x + 2y=4 Gi) (2) x- yt z= 2 
CC)! ie Spee (c) —2x + 4y —- & = -3 


(iii) (a) 2x + y—2z= —5 


(b). 33.— yr Z= 8 
(c) 6x + Sy — 2z= .5 
3. Solve the following systems of equations using Gaussian elimination. 
(a) 2x, — 3x, + 2x, = 0 (OH) 2, = Kot X= 2 
Sy ae ky =F 2x, + 2x, — 4x, = —4 
—x, + 5x, + 4x, = 4 tm 2p Sx = SS 
(c) ~% — 3%y + 2 = 2 iQ}. 23, *+4e, — 2.=" 4 
2c, + x 3k, = F x, -— 2x, — 4x4, = -1 
DX, + 4x, + 6x3 + 12 S25 = Ay a5. = = 4 
(e) x, + x» + 4x, = 4 i). 2s, —'3%.— = 2 
Byyot Xe 3%, = 9 gx; 2%. ey = 


Sx, + 2x, + 5x, 


1] 9x, + 6x, + 4x, = | 


4. Solve the problems in Exercise 3 using elimination by pivoting 
(Gauss—Jordan elimination). 


5. (a) Write the LU decomposition for each coefficient matrix A in Ex- 
ercise 3. . 
(b) Multiply L times U to show that the product is A, for each coef- 
ficient matrix A in Exercise 3. 


6. Find the determinant of each matrix in Exercise 3 using Theorem 2. 
7. Re-solve each system in Exercise 3 with the new right-hand-side vector 
[10, 5, 10] using the numbers in the L and U matrices you found in 


Exercise 5. 


8. For the right-side vector b = [1, 2, 3], solve the system of equations 
Ax = b, where instead of A, you are given the LU decomposition 


of A. 
| 0 0 2 l ] 
whet th On. U= 1,0 3 2 
a aS 0 0 -—-2 
l 0 0 f =2 2 
(b) L =] —2 0 |, U=]0 5 2 
4 -1 | 0 0 2 


9. Solve the following systems of equations using Gaussian elimination 
and give the LU decomposition of the coefficient matrix. 


190 


10. 


11. 


12. 


By 
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(a) x¢-+- 3%. f+ 2a & = 7 
ey Ake PO a te ee = 
2s, = 255 + 3p — 4 = 5 
Sy = Ske = 1 2S =4 
(Db) Sx, + 2%,.+ x, = 3 
Xr XX ly Sg = 
ay = RE Oy, = 3 
yo Aer Ke eal 0 
(C) ist ee ee ee a 4 
2% + X= $ 
3X; = 2%,.= 3 


4x, —— 2x> + X3 a 3X4 —_ 15 


Exercise 9, part (c) can be simplified by first solving the second and 
third equations for x, and x,, and afterward solving for x, and x,. Solve 
Exercise 9, part (c) this way. 


Given the LU decomposition of an n-by-n matrix A, how many mul- 
tiplications are required to compute A as the matrix product LU (allow- 
ing for known 0’s in L and U)? 


Determine whether each of the following systems of equations has a 
unique solution, multiple solutions, or is inconsistent. 


(a) 2x —- 3y= 6 (b) x, + 2x, + 3x, = 10 
—6x + 9y = 12 2X, — XX + 4x, = 20 
5x, + 2x, = O 
eo cw. Bor eS SO (d) x, + XX + 2x, = 0 
Xx; + 3X> + 6x; — Y . 2X; + X> — 3X; — 0 
(fe) x, + & FF 2,= 3 (f) x + 26* Sx = «5 
=, — 2 + 4 = 8 I eS Be E'S 
By Key Bee 2 —5x, + 4x, + 10x, = 14 


Use Gaussian elimination to solve the following variations on the refin- 
ery problem in Example 2. Sometimes the variation will have no so- 
lution, sometimes multiple solutions (express such an infinite family of 
solutions in terms of x3), and sometimes the solution will involve nega- 
tive numbers (a real-world impossibility). 


(a) 20x, + 4x, + 4x, = 300 (b) 6x, + 5x + 6x; = 500 
Sx, FF Sxo.+F 3x, 850 10x, + 10x, 850 
Ax, + 5x. + 11x; = 2050 2X) + 12x, = 1000 
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(c) 6x, + 2x, + 2x, = 500 (dd) 8x, + 4%, + 3x, = 500 
3x, + 6x, + 3x, = 300 4x, + 8x, + 5x; = 500 
3x, + 2x, + 6x; = 1000 12x, + 6x, = 500 


14. Solve each system of equations in Exercise 3 with elimination by piv- 
oting in which off-diagonal pivots are used—to be exact, pivot on entry 


15. 


16. 


17. 


(2, 1), then on (3, 2), and finally on (1, 3). 


Solve the following systems of equations by Gaussian elimination. 
When you come to a zero entry on the main diagonal, interchange 


equations as appropriate. 


(a) x, + 2x, + 3x, = 6 (b) xX, —- 3% + x3 = 4 
2x, + 4x, + 5x, = 12 —2x, + 6x, — 2x, = 7 
2%; + 3X%5:— 323 = 10 y+ Be B= S 

(cc) 4% - w+ea +X, = 6 (d) X, + xX, 0) 
2X; =i — 5, = 5 Ia + i = | 
ys att & +x,= 4 x; — X= 2 

XH +X, - X= 3 xX, + X = 3 

The following systems of equations are large, but their special tridi- 

agonal form makes them easy to solve. Solve them. 

(GS). Xe =" He = 2 
=, + 25 —. Xs = 0 

= ey Zig Xe = 0 
— Xs + 2%-— ds = 0 
— ot ee sz 
(b) 2x, + xX = | 
xX +2y+ & = | 
Xy 2k, + OX = | 
Xe 2X Rg = 1 
X, + 2x5 = I 
The staff dietician at the California Institute of Trigonometry has to 


make up a meal with 600 calories, 20 grams of protein, and 200 mil- 
ligrams of vitamin C. There are three food types to choose from: rubbery 
jello, dried fish sticks, and mystery meat. They have the following 


nutritional content per ounce. 


Jello Fish Sticks Mystery Meat 
Calories 10 50 200 
Protein l 3 2 
Vitamin C 30 10 0 
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18. 


19. 


20. 


21. 


22. 
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Set up and solve a system of equations to determine how much of each 
food should be used. 


A furniture manufacturer makes tables, chairs, and sofas. In one month, 
the company has available 300 units of wood, 350 units of labor, and 
225 units of upholstery. The manufacturer wants a production schedule 
for the month that uses all of these resources. The different products 
require the following amounts of the resources. 


Table Chair Sofa 


Wood 4 | 3 
Labor 3 2 5 
Upholstery 2 0 4 


Set up and solve a system of equations to determine how much of each 
product should be manufactured. 


A company has a budget of $280,000 for computing equipment. Three 
types of equipment are available: microcomputers at $2000 a piece, 
terminals at $500 a piece, and word processors at $5000 a piece. There 
should be five times as many terminals as microcomputers and two 
times as many microcomputers as word processors. Set this problem up 
as a system of 3 linear equations and solve to determine how many 
machines of each type should there be. 


An investment analyst is trying to find out how much business a secre- 
tive TV manufacturer has. The company makes three brands of TV: 
Brand A, Brand B, and Brand C. The analyst learns that the manufac- 
turer has ordered from suppliers 450,000 type-1 circuit boards, 300,000 
type-2 circuit boards and 350,000 type-3 circuit boards. Brand A uses 
2 type-1 boards, | type-2 board, and 2 type-3 boards. Brand B uses 3 
type-1 boards, 2 type-2 boards, and | type-3 board. Brand C uses | 
board of each type. How many TV’s of each brand are being manufac- 
tured? 


This exercise shows why each pivot (in elimination by pivoting) must 

be in a different row. 

(a) In Example 7, make the third pivot on entry (3, 3) instead of on 
entry (3, 2). Can you still read off the solution? 

(b) In Exercise 3, part (b), make the following sequence of pivots, 
entry (3, 1), entry (2, 2), then entry (2, ae Does this provide a 
solution to the system of equations? 


For what values of k does the following refinery-type system of equa- 
tions have a unique solution with all x; nonnegative? 
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6x, + 5x, + 3x, = 500 
4X; + a 155 
Sx, + kx, + 5x; = 1000 


= 


23. For an arbitary 2-by-2 system of equations 


ax + by 
ex + dy 


| 


r 


A) 


(a) Determine the LU decomposition of the coefficient matrix A. 
(b) Verify that L times U equals A. 


24. Use Theorem 2 to show that in Ax = b, if one row of A is all 0's, 
then det(A) = 0. 


25. Consider the following 3-by-3 matrix whose entries are functions. Find 
the LU decomposition of this matrix and find its determinant. 


6 Ay: ete 


Computer Projects 
26. Write a computer program to perform Gaussian elimination on a system 
of n equations in n unknowns (watch out for 0’s on the main diagonal). 


27. Write a computer program to perform elimination by pivoting on a 
system of m equations in ” unknowns (watch out for 0 pivots). 


Section 3.3 The Inverse of a Matrix 


In this section we study a general method for solving a system of equations 
Ax = b for any b, instead of for one particular b as in Sections 3.1 
and 3.2. 


Any matrix A has an additive inverse, the negative — A (obtained by 
changing the sign of all entries), such that 


A +(-—A) =O 
A multiplicative inverse A~' has the property 
AA~' = I and AA = J 
where I is the identity matrix. Inverses allow us to ‘“‘solve’’ a system of 


equations symbolically the way one solves the scalar equation ax = b by 
dividing both sides by a, obtaining x = a ‘b. 
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Theorem 1. If A has an inverse A~', then the system of equations 
Ax = b has the solution x = A~‘b. 


Proof. As in the one-variable case, we divide both sides of Ax = b 
by A, that is, multiply both sides by A™': 


A~'(Ax) = A7'b (1) 


Using matrix algebra and the fact that A~'A = I, we can rewrite the 
left side of (1), A~ '(Ax), as x. The details of this rewriting are 


A- (Ax) = (A7"A)kx = kk =x (2) 
Combining (1) and (2), we have the desired result: x = A~'b. a 


A matrix A is invertible if it has an inverse. In many books the term 
nonsingular is used instead of invertible; a singular matrix has no inverse. 
Some matrices are invertible and some are not. Much of the theory of linear 
algebra centers around conditions that will make a matrix invertible. We 
note that if a matrix A has an inverse A~', the inverse is unique (see Exer- 
cise 17). 

In Section 3.1 we saw that finding a (unique) solution to a system of 
linear equations was dependent on whether the associated coefficient matrix 
A had a nonzero determinant. Now we have another sufficient condition, 
the existence of A~'. 

We will show shortly how to calculate inverses, when they exist. First, 
let us verify that certain matrices do and do not have inverses. 


Example 1. Matrices With and Without Inverses 


2 a | 1 - 
(i) Matrix A = E | has the inverse A~' = {) 


2 —2 § 


(For the present, do not worry how this inverse was found.) Mul- 
tiplying A times A~', we have 


31 1 —$] |3%1+1x-2 3x—} + 1x4 
42—2 #2) t4xi+ 2x =2. 4x +4 + 28 


The reader can verify that if A~' precedes A, again A~'A = I. 


| 
(ii) We claim that the matrix B = 


has no inverse. 
oe 
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The key to our claim is the observation that the second row is 
twice the first row. 


bf = 2b{ (3) 


where b* denotes the ith row of B. 
Suppose that C were the inverse of B, so that BC = I = 


1 0 
. If e§ and e§ are the two columns of C, the matrix product 


0 | 
BC is the following collection of scalar products: 
bk -cC hk- cc 
BC = Fees DP ees E. 1 0 (4) 
bec be + cy 0 1 


From (3), b%:cf = 2b*%-c®. So the second row of 
BC, [b%-cf, b’-cS], must be twice the first row, 
[b¥-co b*- cS], but the second row of I is not twice its first 
row. This contradiction shows that no inverse can exist. a 


Example |, part (ii) shows that if one row of A is a multiple of another 
row, no inverse can exist. This result complements Proposition | of Section 
3.1, which says that if one row is a multiple of another, det(A) = 0 (so no 
unique solution to Ax = b exists). 


Remember that the inverse of a matrix, when it exists, is unique. The 
following example shows how to compute the inverse of a matrix. 


CA 
Example 2. Computing Inverse of 
a 2-by-2 Matrix 


Consider the 2-by-2 matrix A and its (unknown) inverse X: 


re ae | ee Kise ies 
4 2 Xo, X> 


We require that AX = I: 


be Aiptee Sah) ot ey. 
a ig |; ahs ea = | 1 =" ©) 


We determine X (= A™~') a column at a time. First we set 
Ax‘, the first column in product AX of (5), equal to the first column 
of I: 
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OT 


DXi + Xqy = I (6a) 
4x1, + 2x, 0 


Similarly, the second column yields the system 


3X2 + Xz = 0 (6b) 
4x), + 2x» = 1 


Using elimination by pivoting on the augmented coefficient ma- 
trix for (6a), we obtain 


ame ee ) 3 3 y vg l | 
7 
fe alolr lo alate ifao) 
SO X,;,; = 1, xX, = —2. For (6b) we obtain 
Te Pe, be, 0 1 0, -3 
7b 
i ratte sada de tt 3 um 


SO X12 = —4, X59 — 3. 
Substituting these values for x,, back into X (=A~'), we have 


=i9 3 a 


Although elimination is the preferred way to solve a system of equa- 
tions, Cramer’s rule yields an easy-to-remember formula for solving the 
equations for the inverse of a 2-by-2 matrix. For a general 2-by-2 matrix A, 


the system of equations Ax$ = e, [like (6a)] has the solution by Cramer’s 
rule: 


1 ay a | 


0 54> A>) 0 


52 a — 41 


Xi AA) ati dey ee 


The simple form of the numerator comes from having a right-side 
vector of [1, 0]. The same simplification occurs in solving Ax$ = e, by 
Cramer’s rule (we have the same system of equations except that the right- 
side vector is now [Q, 1]). The reader should check that with [0, 1], (8) now 
yields the solution: x,, = —a,,/det(A), x5. = a,,/det(A). 

These single-number numerators lead to the following general formula 
for the inverse of a 2-by-2 matrix: 
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Formula for Inverse of a 2-by-2 Matrix 


fA = is ae eo | a9 


Qz, 4 det(A) | —a,, 


In words, a 2-by-2 inverse of A its obtained as follows: Divide all 
entries of A by the determinant, then interchange the two diagonal entries 
and change the sign of the two off-diagonal entries. 

The method in Example 2 for finding the inverse of A a column at a 
time can be applied to any matrix. 


Theorem 2. Let A be an n-by-n matrix and e, be the jth unit n-vector, 
e, = [0,0,...,1,... , OJ. If the n-vector x; is the solution to the 
matrix equation 


Ax; = @ (10) 
fori = 1, 2,...,, then the n-by-n matrix X with column vectors 
x; is the inverse of A. 

er SE os pl (11) 


Note: /f the systems Ax; = e; do not have solutions, A does not have 
an inverse. 

Recall that when we solve a system of equations by elimination, the 
right sides play a passive role. That is, using a different right side b does 
not change any of the calculations involving the coefficients. If affects only 
the final values that appear on the right side. Thus, when we performed 
elimination by pivoting on the coefficient matrix in (7a) and (7b) of Example 
2, we could have simultaneously applied the elimination steps to an aug- 
mented coefficient matrix [A I] that contained both right-side vectors. The 
computations would be 


de edits) Mee beet de ee Peg ere 
oO 2 Ow ee Po A, Res” 2 


So starting with [A I], elimination by pivoting yields [I A‘). 


Computation of the Inverse of an n-by-n Matrix A. Pivot on entries 


(1, 1), (2, 2), ..., (m, n) in the augmented matnx [A_ I]. The 
resulting array will be [I A~'J. 
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In Section 3.2 we learned that the LU decomposition can be used to 
solve Ax = b for several different b’s. The LU decomposition and pivoting 
on the augmented matrix [A_ I] are equally fast ways to find the inverse. 
In hand computation, the augmented matrix method is easier. 


Example 3. Inverse of a 3-by-3 Matrix 
Let 


eee: 
A=]2 4 2 
ee ae 


We compute the inverse using pivoting on the augmented matrix 


[A I]. 
(a) Lot czy 8 VO. 8 
[A I] = (b) Fe ZNO DD 
(c) Fa et =A TD 
(a‘) = (a) l 0 2 i 2 ® 
(b’} = %) = Aa) 10 4 =21-2 1 0 
(c’) = (c) — (a) 0 y Zw~=] OD 1 (13) 
(a) = (a) | 0 2 ] 0 O 
(b”) = (b’)/4 O tt -¢!]-F 70 
(c”) = (c’) — 2(b”) 0 0 5 —4 ] 
(a”) = (a”) — 2(c”) | 1 0 0 | 5 -—§ 
(b”) = (b”) + (c”)/2}0 1 0| -3 5 io 
(c”) = (c”)/5 a 0 -*% } 
Thus 
i. oF os 
Avbel=} 4 4 
0 -% ¢ i 


If we have to solve a system of equations Ax = b for many different 
right-hand sides, it is useful to know A~!. For each new b’, we find the 
solution of Ax = b' asx = A~'b’. The inverse also lets us determine how 
a small change Ab in b will affect our solution x. 
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EGE TENE EES: 
Example 4. Use of Inverse in Multiple 
Right-Hand Sides 


In Example 5 of Section 3.2 we solved the refinery system of equations 
by pivoting along the diagonal. Let us use the same sequence of pivots 
with the augmented matrix [A I] to compute the inverse. 


(a) a) et a XE OO 
(b) 10 14 $10). 1 0 
(c) SF Ad? 9 
(a') = (a)/20 i $s, D8 
(b’) = (b) — 10(a’) |O 12 3| -% 1 O 
(c’) = (c) — 5(a’) i ae te” ee 


(14) 
(a")=(a')-—(")/57T1 0 & ho —-w 

(b”) = (b’)/12 24 
(c") = (c') — 4(b") 10 0 10 -v —-3 


(a) = (a") — go(c”) [1 0 0 3405-850 — B00 
(b’”) — (b”) ne 1c") 0 | 0 | —7 ' Bi —z 
(c"”) = (c")/10 0 Ot =a =e 


© 
— 
a |e 
| 
tw 
= 
Sa 
—_ © © 


The inverse is, in decimals, 


05958 -—.01166 —.0I5 
A~' = | —.03958 09167 —.025 (15) 
= 00833. —.035533 1 


If we were given a right-hand-side vector for the refinery system, 
say the vector b’ = [300, 200, 100], then the solution can be obtained 
by computing x = A~'b’. 


05958 —.01166 —.015 || 300 
— 03958 09167 —.025 |} 200 
=~ 00833 =—.05333 1 100 


x = A'b’ 


— 03958 x 300 + .09167 x 200 — .025~x 100 
— .00833 x 300 — .03333x200 + = .1x100 


iof = Zoos ¥.5 14.0 
=11.87' + 18.53.— 2.5) =] 40 (16) 
— Po = OOF + 30 8 


05958 x 300 — .01166 x 200 — .015 x 100 


200 


Ch. 3 Solving Systems of Linear Equations 


Observe that if Ab = [1, 0, 0], the solution Ax = A-'Ab = 
(A~')T, where (A~')© denotes the first column of A~!. If we wanted 
to increase production by one unit of the first product—change b to 
b + Ab—the solution changes from x to A~'b + A-~'Ab = x 4 
Ax. Thus the change equals Ax, which is (A~')©. Similarly, the second 
and third columns tell how the solution will change if we need 1 more 
unit of the second or third product. In sum, the columns of A~! show 
us how the solution x changes when the right-side vector b changes. 
For example, to find the solution x* when we change from 
[300, 200, 100] to b’ = [300, 300, 100], we take the solution 
x = A’ 'b = [14.0, 4.0, .8] computed in (16) for b and change it 
by A~' Ab, where Ab = b’ — b = (0, 100, 0]. So 


b 


300 300 0 
x* = A~'| 300] = A-'| 200] + A-'| 100 
100 100 0 
(17) 
14.0 ae he. 12.8 
=) 401+ 1 OF) =) 3352 
8 a5 = 3.5 = 


The next two examples interpret the role of the inverse in two familiar 


linear models. 


Decoding Alphabetic Messages 


In Example | of Section 1.5 we introduced a scheme for encoding a 
pair of letters L,, L, as a coded pair C,, C5. Recall that each letter is 
treated as a number between | and 26 (e.g., BABY is the numeric 
sequence 2, 1, 2, 25) and arithmetic is done mod 26. We considered 
the following instance of this scheme: 


= a Pe ate be” en ha 
Example 5. 


C, = 9L, + I7L, (mod 26) (18) 
C, =7L, + 2L, (mod 26) 


If L, = E (= 5) and L, = C (= 3), this pair of letters would be 
encoded as the following pair C,, C,: 


C,; = 9X5 + 17X3 = 96 =18 (mod 26) =R 
C,=7x5 + 2X3 =41=15 (mod 26) =O 
In matrix form, with e = (C,, Ci), | = (L,, L,), and E = 


9 17 | 
~ | (18) becomes 


c = El (mod 26) 


The person who receives the coded pair ¢ will decode ¢ back into the 
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two original message pair | by using the inverse of E: 
l= E-'e 
To use the formula (9) for a 2-by-2 inverse, we first compute 
det (E) = 9X 2 -—7 xX 17 = —101 =3 (mod 26) 
since —101 = —4- 26 + 3. 
Observe that 3 X 9 = 27 = 1 (mod 26), and therefore 
1/det(E) = 4=9 (mod 26) (note that division mod 26 is not always 


well defined, but the numbers in this case were chosen so that division 
would work). By (9), we have 


— ae — 
pte]. tat OOK? vase 
-7 9 -9xX7 9x 9 


2 
16) Sd SE 


So the decoding equations are 


L, = 18C, + 3C, (mod 26) (19) 
L, = 15C, + 3C, (mod 26) 


For example, the pair R, O (= 18, 15) is decoded using (19) as 


L,=18 X 18 + 3 X 15 = 324 + 45 = 369 
=5 (mod 26) = E 
E,=15 X 18 + 3 X 15 = 270 + 45 = 315 
=3 (mod 26) =C 
So R, O decode back to the original pair E, C, as required. ® 


Be as Bice | 
Example 6. Reversing a Markov Chain 


The transition matrix A in a Markov chain is used to compute the 
probability distribution p’ in the next period from the current proba- 
bility distribution p according to the matrix equation 


p = Ap (20) 
Suppose that we want to run the Markov chain backwards— 
earlier in time—so that the relation in (20) becomes reversed and p' 


is used to determine p. Then the new transition matrix should just be 
A~', since solving (20) for p yields 


p= Ap 
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We now try to invert the Markov chain for the frog-in-highway model 
introduced in Section 1.3. The transition matrix augmented with the 
identity matrix is 


am ato. VU OO - Ble SRO ae 
Se! ee akg SD OD ee , OPO Ge 
UY. soo oe! go ae | 9 0 Pi ev 21) 
o © 2 38-33" Poe Br ea ieee 
0 Or De cee Se | oo 0.0 tf 4 
OO GBR” B25 VSS GP Ona he ere 


As usual we pivot down the main diagonal on entry (1, 1), then on 
(2, 2), then (3, 3), then (4, 4), then (5, 5), and finally (6, 6). After 
the first five pivots, we have 


rl oY B-—] | 10 —§ 6 -—4 4 Q 
0 1000 —-2 —-16 16 -—12 § -4 0 
0 O 0 0 -2 12 -12 12 -8 4 0 
| (22) 
yo 02 } § =—2 —§8 Ns) —§ 8 —4 0 
OAs Go, me | 4 —4 4 -—-4 4 0 
oh OE 0 —] l — | 1 -1 l 


The elimination process fails because entry (6, 6) is 0. Recall that a 
similar difficulty arose when we were performing elimination in Ex- 
ample 2 of Section 3.2. The failure of the elimination process means 
that the transition matrix A is not invertible. Some Markov transition 
matrices are invertible, but their inverse will have negative entries and 
not make sense as a Markov chain—see the Exercises. 

A Markov chain cannot run backward. To see why, let p’ be a 
unit vector, say, eg = [0, 0, 0, 0, 0, 1]. A moment’s thought shows 
that there is no vector p such that Ap = e,. That is, there is no 
distribution for the frog in the previous period that would force the 
frog, with certainty, to be in state 6 now. ia 


There is a simple algebraic way to explain the computation of the 


inverse in Examples 3 to 6. We write the original system of equations 


= bas 


Ax = Ib (23) 


20 4 4x, I O O]} Dd, 
10 14 Sijxm}=]O0 1 OLB, (24) 
a eee OA | IE O 0 14,6, 
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Then we perform elimination by pivoting to convert A to I and get 


ix = A~'b (25) 
In Example 4 this is 
lL QO Of} 2, 05958 —.01166 —.015 }} dD, 
0 1 Of} x, |] = | —.03958 09167 —.025 || b, (26) 
0 0 ] X3 im 00833 => .03333 l b, 


Similarly trying to reverse the Markov chain, we wanted to convert 
Ap = Ip’ intoIp = A™'p’. 
We next turn to some theory about inverses. 
Theorem 3. Properties of the Inverse 
(i) If A and B are invertible matrices, AB is invertible and 
(AB)—' = B-'A~! 
(ii) If A is an invertible matrix, A~! is invertible and 
(A) I= A 
(iii) If A is an invertible matrix, so is its transpose A’ and 
(A)? —_ (Ay 
Proof of (i). The reasoning given here is typical of proofs involving 
inverses. Since the inverse of a matrix is unique—this fact is critical— 
we only need to check that AB times B~'A~! is I. 


(AB)(B-'A~') = A(BB~')A~' = AJA“-' = AA“'=T1 


Next we show the links between inverses, determinants, and solutions 
of systems of linear equations. First note the following relation between the 
determinants of A and of A~! (assuming that A~' exists). The identities 


det(AB) = det(A) - det(B) and AA“! =I 
together imply 
det(A) - det(A~') = det(AA~') = det(I) = 1 
Thus 


l 
det(A-')=-——— and det(A) = 


det(A) @) 


det(A~ ') 
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Theorem 4. Fundamental Theorem for Solving Ax = b. The follow- 
ing four statements are equivalent for any n-by-n matrix A. 


(i) For all b, the system of equations Ax = b always has a unique 
solution. 
(ii) The system of equations Ax = Ib can be converted, using elimi- 
nation by pivoting, to the system Ix = A™'b. 
(iii) A has an inverse. 
(iv) det(A) ¥ 0. 


Proof 
(1) — (ii): The elimination by pivoting in (ii) is equivalent to si- 
multaneously solving Ax = e; forj = 1, 2,..., mand 
Ax = e, can be solved by (i). 
(11) — (11): Obvious. 
(iii) — (iv): If A~' exists, then formula (27) says that det(A) = 
1/det(A~') ¥ 0. 
(iv) — (i): As noted in Theorem | of Section 3.1, when det(A) # 
0, Cramer’s rule gives a unique solution to Ax = b. & 


We have the following useful corollary. 


Corollary 


(1) If for some b, the system of equations Ax = b has two solutions, 
then A is not invertible. 


Sow 
(i) Conversely, if A is not invertible;then_for-all.b, Ax eb has 


Sas nO Cities oe multiple solutions.” ; +3 te 


We conclude this section by incorporating inverses into the eigenvalue- 
based analysis of growth models that was developed in Sections 2.5 
and 3.1. 


Example 7 ; Computer—-Dog Growth 
Model Revisited 


In Section 2.5 we introduced the computer—dog growth model 


x 4 x = [C, D] 
' = Ax here A = 
x : where ; “ x = (C.D) 


The two eigenvalues and associated eigenvectors of A were A, = 4 
with u = [1], 1] and A, = | with v = [1, —2]. 


Our starting vector was x = [1, 7]. We expressed x as a linear 
combination, x = au + bv of u and v. For this x, it is 


x = 3u -— 2v Gi.e., [1,7] = 3f1, 1] — 21, —2}) (28) 
Then 
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Ax = A(3u — 2v) = 3Au — 2Av 
= 3(4u) — 2(lv) (since u, v are eigenvectors) 
= 12u — 2v (29) 


After k periods, we have 


A*‘x = A*(3u — 2v) = 3A‘u — 2A‘v 
= 3(44u) — 2(1*v) (30) 
= 3- 4*{1, 1] — [2, -—4] i 


We now describe in matrix notation the three basic steps in the eigen- 
value-based analysis of growth models, as illustrated in Example 7. 


Step 1. Express x as a linear combination of eigenvectors, as in (28). 
This step, which involves solving a system of equations, can be expressed 
in terms of an inverse. If x = [x,, x5], u = [u,, up], and v = [v,, v>], the 
statement x = au + bv is equivalent to 


7 Es] =e[e] eo [t 
b 


x = Ue, where U = " at 
Uy V2 


~~ 


+ 


By using inverses, the solution to x = Uc is 
¢ =U-"s (31) 


Step 2. Given e¢ = |[a, b], multiply a by \, and b by d,. We can write 
this step in matrix notation as follows: 


ay,| |A, O|la ed | 
Fe bl SS 1 dt, 


where D, is the diagonal matrix of eigenvalues. 


Step 3. Express Ax as a linear combination of eigenvectors. 


d 
Ax = an, sy + bd, ci =U eg = Uc’ (33) 
2 2 2 


where Unis the matrix. with.eigerivectors as columnS (Sée.above). 
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Combining the three steps (31), (32), and (33), we have 
Ax = Uc’ = U(D,c) = UD,U-'x (34) 
This equation is true for any x, and hence we have 
A = UD,U™'! (35) 


Furthermore, as in (30), powers of A have a similar form (verification is 
left as an exercise): 


A‘ = UDKU-' (36) 


For the computer—dog matrix, (35) becomes [we compute the inverse U7 ! 
using the determinant-based formula (9)]. 


ee Dal-[ alti 4] 


and (36) becomes 


< id Dee Oe os 
w= alle oll al - 


Equations (35) and (36) formalize in single matrix equations our 
eigenvector-based computations for a growth model. They also give us a 
simple way to compute powers of any matrix A—provided that we know 
the eigenvalues and eigenvectors of A. 


Theorem 5. If A is an n-by-n matrix with n distinct eigenvectors 
U,, U>,... , U,, and associated eigenvalues |A,| = |A,| =-- - = |A,], 
then 


A = UD,U™! and A* = UD{U™! (39) 
where U is an n-by-n matrix whose jth column is u,. 
Stating (39) in words, 
multiplication by A 
is equivalent to: 
(i) converting to eigenvector coordinates—multiplying by U~' does this; 
(11) su ie those coordinates by the eigenvalues—multiplying by D, 


does this; and finally 
(iii) converting back to standard coordinates—multiplying by U does this. 
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The formula A = UD,U~'! for computing Ax can be visualized with 
the following diagram: 


x een Re ae Ax = UD,U~ 'x 
U-! 
U 
D, 
U-'x ea a ee D,U~'x 


This eigenvalue decomposition of a matrix, often called diagonaliza- 
tion of A, is extremely important. Beside simplifying the computation of 
powers of a matrix, it can also be applied to other functions of a matrix A. 
In differential equations (Section 4.3), the expression e“* arises frequently 
and can be evaluated with the help of (39). This decomposition will be 
discussed further in Section 5.5, where we also present methods to find all 
the eigenvalues of a matrix. 

We note that like the LU decomposition, the eigenvalue decomposition 
is an example of using proper representation of the matrix and an example 
of a matrix product being a “‘program’’ (see the end of Section 3.2). If we 
want to solve systems such as Ax = b, the LU decomposition is the right 
way to “‘store’’ A. If we want to raise A to various powers, the eigenvalue 
decomposition is the appropriate way to ‘‘store’’ A. Computing A*x as 
(UD‘U ~ ')x is a case where the matrix product UD{U~' is a ‘‘program’”’ 
telling us how to compute A*x through a sequence of three distinct steps. 

We now apply Theorem 3 to our rabbit—fox model to rework the com- 
putations done at the end of Section 3.1. 


hs ae Lia 
Example 8. Rabbit-Fox Growth Model Revisited 


The growth model we have used repeatedly is 


R =R + AR — ASF or p’ = Ap 


J .85 
(40) 


1.1 —.15 
F' = F + .1R — .15F where A = 


In Section 3.1 we found the eigenvalues and eigenvectors to be 


A, — ls u, = [3, 2] A,» — 95, u, — tle 1] 
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So 
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Oe ee py tee! ee A a 7 
gy é f rare F a! “ iG GY) 


where again the 2-by-2 inverse formula (9) is used for U~'. 

Suppose that our starting vector is p = [50, 40] and we want to 
determine the population vector after 20 periods. Then we must com- 
pute 


l 0 
We evaluate (42) from right to left. That is, first compute 


we ce pok =i) 110 
eo >= |} 3 || 40 20 (49) 


Recall from Example 7 that this vector ¢ is the set of weights a, b on 
the eigenvectors. Next we multiply this vector of weights times D?°: 


ee tn SOT fat 
ee ! 3585 || 20 7.17 ii 


Finally, we use c’ to form a linear combination of the eigenvectors: 
x 215 40 
RETR 
I 
+ DENT = " 
] 7 w 


p? -_ Ap ni Ue’ 


I 
= 
o 

ai 
Nw 


Section 3.3 Exercises 


Summary of Exercises 

Exercises 1-17 involve computation of inverses and interpretation of entries 
in an inverse; Exercise 4 gives an important geometric picture of inverses. 
Exercises 18-27 examine properties of inverses. Exercises 28 and 29 deal 
with the existence of solutions and applications of Theorem 4. Exercises 
30-37 deal with eigenvalues and Theorem 5. 


| 


2. 


Verify for the matrix A in Example 1, part (i) that A~'A = I. 


Write the system of equations that entries in the inverse of the following 
matrices must satisfy. Then find inverses (as in Example 2) or show 
that none can exist (following the reasoning in Example 1). 
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ee 01 2 | -1 3 
" k 7 ” & | * ‘ «| 2 of 
| 3 

| 

: 


Fe 
(e) |}2 4 2 (f) | 0 
as ee 


3. (a) Write out the system of equations that the first column of the inverse 
of A must satisfy, where 


1 0 2 
A=i10 } 3 
| 4 
(b) Determine the first column of A~'; use part (a) and Cramer’s rule. 


4. This exercise gives a ‘‘picture’’ of how when two columns of A are 
almost the same, the inverse of A almost does not exist. For the fol- 


x | 

lowing matrices A, solve the system n J - Then plot x,a 
x2 

and x,a$ in a two-dimensional coordinate system and show geometri- 


| 
cally how the sum of vectors x,a© and x,a$ is " (here af, aS denote 


the two columns of A). 


2 3 2 3 g 10 g 9 
(a) ; 4 (0) i . (c) " 7 (a) ‘ s 


5. (a) Find the inverse of the transition matrix A for the weather Markov 
chain (introduced in Example | of Section 1.3), where 


“l 


If p is today’s weather probability distribution and p® is yesterday’s 
distribution, show that p? = A7'p. 
(c) Find yesterday's weather probability distribution if today’s weather 
probability distribution is 
(i) p= [3,3] Gi [3.3] (iii) (0, 1] 
(d) Use a computer program to determine the weather probability dis- 
tribution 20 days ago for the current distributions in part (c). 


(b 


— 


6. (a) Find the system of equations for decoding the following encoding 


schemes. 
(i) C, = 3L, + SL; (ii) C, = LIL, +.6L, 
C, = SL, + 8L, C, = 8&L, + SL, 
(ii) C, = 2L, + 3L, 
C, =7L, + SL, 


Hint: The inverse of 7 is — 11: the inverse of —11 is 7. 
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10. 


11. 
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(b) Decode the coded pair EF in each of these schemes. 


. Use elimination by pivoting to find the inverse of the following matri- 
ces. 
aS ae =; as 
(a) 1 -] l (b) 2 2 —-41 
—] 5 4 —2 3 
—j| -3 2 2 4 -2 
(c) 2 | 3 (d) 1 -2 -4 
5 4 6 =—2 -] -3 
rs 4 2-3 -] 
fers. b 3 (ff) }3 -5 -2 
a ae y 6 4 
. For each matrix A in Exercise 7, solve Ax = b, where b = 
110, 10, 10]. 


. For each matrix A in Exercise 7, how much will the solution of 


Ax = b change if b is changed 

(a) From the vector [b,, b,, b] to the vector [b,, b, + 1, b3]? 

(b) From the vector [b,, b,, b;] to the vector [b,, b,, b,; — 2]? 

(c) From the vector [b,, 63, b3] to the vector [b,, b, + 1, b; — 1]? 


Use elimination by pivoting to find the inverse of the following 
matrices. 


>) iia i ee oa 
¢) ig “Eee Ta oe oe 
ih Se ee | ee 
Ls pers i Tae Ss eae de i 
je ® “ES 
7. 7 eae 
Mean EG 
‘Seen es ae 


(a) Find the inverse of the tridiagonal matrix 
l -] 0 0 


0 
=] z =1 0 0 
Oi .—% Zo] 0 
l 

2 
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12. 


13. 


14, 


15. 


16. 


Note that the inverse is not tridiagonal or in any way sparse. 
(b) Change entry (1, 1) from a | to a 2 and repeat part (a). Does this 
small change affect the inverse substantially? 


Reverse the following Markov chains. Then find the “‘distribution’’ in 
the previous period if the current distribution is [.5 0 .5]. Is this 
distribution really a probability distribution? 


Re i S 3s, 79 A 3 
(a). VS). 2S hy £5 So 25 (c) |.3 4 3 
00 .5 G28 SB 3 34 a 


Try to find the inverse of the frogger Markov chain when there are five, 
not six states (three lanes of highway). 


(a) Describe those n-by-n Markov transition matrices A* (for each n) 
that have an inverse A*~ ! such that if p is a probability distribution, 
then p° = A*~'p is always a probability distribution. 

(b) Give an informal argument why no other such reversible Markov 
chain can exist (the reasoning is similar to that used in Exam- 
ple 6). 


(Continuation of Exercise 17 in Section 3.2) The staff dietician at the 
California Institute of Trigonometry has to make up a meal with 600 
calories, 20 grams of protein, and 200 milligrams of vitamin C. There 
are three food types to choose from: rubbery jello, dried fish sticks, and 
mystery meat. They have the following nutritional content per ounce. 


Jello Fish Sticks Mystery Meat 


Calories 10 50 200 
Protein ] 3 2 
Vitamin C 30 10 0 


(a) Find the inverse of this data matrix and use it to compute the amount 
of jello, fish sticks, and mystery meat required. 

(b) If the protein requirement is increased by 4, how will this change 
the number of units of jello in the meal? 

(c) If the vitamin C requirement is decreased by k milligrams, how 
much will this change the number of fish sticks in a meal? 


(Continuation of Exercise 18 in Section 3.2) A furniture manufacturer 
makes tables, chairs, and sofas. In one month, the company has avail- 
able 300 units of wood, 350 units of labor, and 225 units of upholstery. 
The manufacturer wants a production schedule for the month that uses 
all of these resources. The different products require the following 
amounts of the resources. 
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17. 


18. 


19, 


20. 


zi. 


22. 
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Table Chair Sofa 


Wood 4 | 3 
Labor 3 2 5 
Upholstery 2 0 4 


(a) Find the inverse of this data matrix and use it to determine how 
much of each product should be manufactured. 

(b) If the amount of wood is increased by 30 units, how will this change 
the number of sofas produced? 

(c) Ifthe amount of labor is decreased by k, how much will this change 
your answer in part (a)? 


(Continuation of Exercise 20 of Section 3.2) An investment analyst is 
trying to find out how much business a secretive TV manufacturer has. 
The company makes three brands of TV set: brand A, brand B, and 
brand C. The analyst learns that the manufacturer has ordered from 
suppliers 450,000 type 1 circuit boards, 300,000 type 2 circuit boards, 
and 350,000 type 3 circuit boards. Brand A uses 2 type-I boards, 1 
type-2 board, and 2 type-3 boards. Brand B uses 3 type-1 boards, 2 
type-2 boards, and | type-3 board. Brand C uses | board of each,type. 
(a) Set up this problem as a system Ax = b. Find the inverse of A 
and use it to determine how many TV sets of each brand are being 
manufactured. 
(b) If the number of type 2 boards used is increased by 100,000, how 
will this change your answer in part (a)? 
(c) If the number of type | boards is decreased by 10,000k, how much 
will this change your answer in part (a)? 


Why must a matrix be square if it has an inverse? 


Verify that for any invertible matrix A, the inverse of the inverse A7' 
is A. 
Verify that the inverse of A’ is (A~')’. 


Hint: Use the multiplication rule for tranposes, (CD)? = D/C’. 


Show that the inverse of a matrix is unique. 


Hint: If B and C are inverses of the matrix A, compute BAC two 
different ways as (BA)C and as B(AC). 


Show, by the reasoning in Example |, that if a matrix has a row (or 
column) that is all 0’s, then the matrix cannot have an inverse. 


. (a) Following the reasoning in Example 1, show that 
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cannot have an inverse because the third column is the sum of the 
other two columns. 

(b) Generalize the argument in part (a) to show that if one row (column) 
is a linear combination of two others, a = caf + da‘, then the 
matrix cannot have an inverse. 


24. In Theorem 1, show that the solution x = A~'b is unique, that is, 
there cannot exist a different vector x’ with Ax’ = b. 


Hint: Multiply both sides of Ax’ = b by A~!. 


25. Find the inverse of a diagonal matrix 


a, Do B 
0 a. 0 
0 YU a, 


Hint: The inverse is also diagonal. 


26. (a) Use the following fact: The inverse of an upper triangular matrix 
(if the inverse exists) is itself upper triangular, to determine what 
the main diagonal entries must be in the inverse of the upper tri- 
angular matrix 


2 3 4 
a 4 2 
O Bd 


(Do not use elimination by pivoting.) 

(b) Use the main-diagonal entries in the inverse from part (a) and the 
fact that the inverse is upper triangular. Determine the other entries 
in the inverse. 

(c) Consider how the computations to find the inverse in elimination 
by pivoting would go to show that the inverse of an upper triangular 
matrix must be upper triangular. 


27. (a) Determine the inverse of the matrix 


So 2 © — 
oS & = © 
Qo — © @ 
—= © ©O ©& 
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28. 


29. 


30. 


31. 


32. 


33. 
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Hint: The inverse has a simple form; try trial-and-error guesswork. 

(b) Generalize your result in part (a) to give the inverse of an n-by-n 
matrix with 1’s on the main diagonal and 0’s elsewhere except one 
position, entry (i, /), i # 7, whose value is a. 


Which of the following conditions guarantees that the system of equa- 
tions Ax = b has a unique solution; which guarantees that the system 
does not have a solution or that it is not unique, or guarantees nothing? 
Explain the reason for your answer. Assume that A is a square matrix. 
(a) A has an inverse. 

(b) det(A) = 0. 

(c) Ax = b’ has a unique solution for some other b’. 

(d) Ax = b’ has two solutions for some other b’. 

(e) b equals a column of A. 

(f{) The first row of A is twice the last row of A. 


Which of the following conditions guarantees that a matrix A has an 
inverse; which guarantees that it does not have an inverse? Explain the 
reason for your answer briefly. 

(a) The determinant of the matrix equals 17. 

(b) A has twice as many rows as columns. 

(c) A is a 4-by-4 Markov chain matrix. 

(d) The first row of A is twice the last row. 

(e) The system of equations Ax = b can be solved for any b. 


2 


The matrix B = : is the inverse of A = f | 

—3 3 ae: 

(a) Verify thatu, = [1], 1] andu, = [1, —1] are eigenvectors of both 
A and B. 

(b) Determine the eigenvalues of A and B. How are the eigenvalues 


of A and B related? 
Show that the eigenvectors of A~' must be the same as the eigenvectors 
of A. 
Hint: Use the fact that A~'(Au) = u. 
Assuming that A and A~' have the same eigenvectors, show that the 


eigenvalues of A~' must be the reciprocals of the eigenvalues of A 
(i.e., 1/A). 


Hint: Use the fact that A~'(Au) = u. 
Compute the representation UD,U~' of Theorem 5 for the following 


matrices whose eigenvalues and largest eigenvector you were asked to 
determine in Exercise 23 of Section 3.1. 


4 0 hee: a 4 ony 
af ED Beha 
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34. For a starting vector of p = [10, 10], compute p\'” = A'p for each 
matrix A in Exercise 33 (use your representation of A found in Exercise 
33); 


35. (a) Given that A = UD,U~', prove that A7 = UD3U~'. 
(b) Use induction to prove Af = UDSU~'. 


36. (a) Obtain a formula for A~' similar to A = UD,U~'. 
Hint: Only the matrix D, will be different. 


a 4 
(b) Verify your formula in part (a) for A = 3 | 


37. Show that A is not invertible if 0 is an eigenvalue. 


Solving Matrix Problems 
by Iteration 


In this section we show how simple iteration methods can be used first to 
determine eigenvalues and eigenvectors, and then to solve systems of linear 
equations. We want to use an iterative method to find the largest eigenvalue 
(in absolute value) of a matrix A and an associated eigenvector. The largest 
eigenvalue is the largest root (in absolute value) of the characteristic poly- 
nomial det(A—21). However, it is difficult to find the roots of polynomials 
beyond quadratics. Iterative methods are easier to use and yield both the 
largest eigenvalue and an associated eigenvector. 

In Section 2.5 we saw how the largest eigenvalue and its associated 
eigenvector dominate the long-term behavior of a linear growth model. We 
briefly review the results we obtained for the computer—dog growth model. 


Example Review of Role of Largest 
Eigenvalues in a Growth Model 


The growth model for computers (C) and dogs (D) from year to year 
was 


C= 30 +” D | (1) 
pp’ 2C0.+ 2D 


Or 


= =.Ax., where A = ms 
Ye 
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We saw in Section 2.5 that 4 is the larger eigenvalue with eigenvector 
u = [1, 1] (or any multiple of [1, 1]), and | is the other eigenvalue 
with eigenvector v = [1, —2]. 

We can write any vector x as a linear combination oi u and v. 
(see Section 2.5 for details on how to do this). If the initial vector 
is x = [l, 7], we find that x = 3u — 2v (in other words, [1, 7] = 
3[1, 1] — 2[1, —2]). If we want to iterate this model for 20 years, 
we can use u and v to compute A*°x as follows: 


A*°x = A?°3u — 2v) = 3A7u — 2A7°y 
3(4°°n) — 2(17°v) (2) 
= [3 + 420, 3 - 42] — (2, —4] 


The term [3 - 47°, 3 - 4°°] in (2) swamps [2, —4]. So in general, after 
n periods, we have 


A"’x = A”(3u) = 3° 4"u 
or 
A’x = [3 - 4", 3-4" B 
Since A”’x = [3 - 4", 3: 4”], A”x is approximately a multiple of the 
eigenvector |1, 1]. This means that we can reverse the previous reasoning 


and find an eigenvector associated with the larger eigenvalue simply by 
iterating a growth model for many periods. 


> 


Example 2. Determining Largest Eigenvalue and 
Its Eigenvector by Iteration 


In the computer—dog model of Example 1, let us iterate starting with 
x) = [1, 7]. We keep track of the growth rate from one period to 
the next (the sum norm is used). 


x = [1, 7] 


oh = x | = 
x) = [10, 16] xo] = 3-29 
@) = | 
x2 = [46, 52] gay = 377 
@ = ete 
x9 = [190, 196] a) = 394 
Ix(4)| 
x = (766, 772] = 3.98 


x) ~ 
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Ix) 


x) = [3070, 3076] gay = 3-996 
| x 

x = [12,286, 12,292] Fay = 3.999 
(7) 

x‘? = [49,150, 49,156] a = 3.9998 


From the ratios of the norms of the successive iterates, it is clear that 
the growth rate is converging to 4. The two components of the suc- 
cessive iterates are approximately equal; that is, they are multiples of 
[1, 1]. So we conclude that the largest eigenvalue is 4 and its eigen- 
vectors are multiples of [1, 1]. a 


When the sizes of the iterates x get large, one can scale them back 
by dividing their entries by the largest entry (so that their max norm is 1). 
For example, x“ = [12,286, 12,292] would be scaled by dividing by the 
larger entry, 12,292, to obtain x* = [.9995, 1]. 

It is common practice to call the largest eigenvalue (in absolute value) 
the dominant eigenvalue because of the way it dominates the behavior of a 
growth model. Summarizing our method, we have 


Finding Dominant Eigenvalue and Associated Eigenvector of x’ = 
Ax by Iteration. For any starting vector x, compute successive it- 
erates x“ until the ratio |x|/|x*~ "| converges to a fixed value. This 


value is the (absolute value) of the dominant eigenvalue and x™ is an 
associated eigenvector. If x“ becomes too large, ‘‘scale’’ it by divid- 
ing x“ by its largest entry. 


Note that this iterative method was exactly how we found the stable 
probability distribution p* = [.1, .2, .2, .2, .2, .1] for the frog Markov 
chain (implicitly, the largest eigenvalue was 1). 

A geometrical illustration of the convergence of A*x to an eigenvector 
corresponding to the dominant eigenvalue is given in Figure 3.1. Using the 


om 
matrix A = ; 7 in the computer—dog growth model, we plot what 


happens to the set of vectors x in the first and third quadrants of the Cartesian 
plane when we iterate with A: x, Ax, A°x, A°x. Figure 3.2 follows a 
particular vector x, through such iteration. 

We next try our iterative scheme on a 3-by-3 matrix. Note that for 
3-by-3 matrices, there is no simple formula available for finding eigenvalues 
as roots of the characteristic equation det(A — AI). 


see ee eee eee 


eee ee eee eeeeeenwmeneenwneneeneee 


. 

* * * 
* “* **-* - 
ee eee eee eeeeewneeeeennee 


Fe KI MIO SEAS, / 


Figure 3.1 


Figure 3. 
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Example 3. Dominant Eigenvalue in a 
3-by-3 Matrix 


Let us expand our computer—dog growth model to include goats. 


C™= 3C + ' D+ G 
p= 20 - 7D 
= + D+ 2G 


We start iterating with x’ = [1, 1, 1]. 


(1) 
Cia a Ix), ~_ 
x?) = (22, 18, 10] xo, = 417 
(39) a x}. a a 
x® = [94, 80, 38] xa > 424 
(4) 
x = [400, 348, 156] st = 4.26 


Let us scale x‘? = [400, 348, 156] by dividing by its largest entry: 


x*@) = [1.00, :87, .39] 


x) 

xo) = (4.26, 3.74, 1.65] xF)] = 4.26 
Ix | 

x = [18.17, 16.00, 7.04] ol = 4.27 
x7] 

x = [77.55, 68.34, 30.08] op > 427 


So the dominant eigenvalue is 4.27. The scaled form of x‘” is (rounded 
to the nearest hundred) [1, .88, .38]. & 


Instead of taking the ratio of the sum norms of successive iterates to 
approximate the dominant eigenvalue A*, it is more accurate to use the 
following ratio, called the Raleigh quotient. 


x(*) . x(k + 1) x) _ Ax” 
* =_-_ oe — —_— Es 
A x. xh OF x®) « x® (3) 


Applying the Raleigh quotient with just k = 2 in Example 3, we get 


he xs x * U3888 


=p gag eee 
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Solving a System of Equations by Iteration 


Our initial discussion will center around the Leontief economic model pre- 
sented in Section 1.2. This model balanced supplies of a set of commodities 
against the demand for the commodities, demand by industry (as input for 
supplies produced), and demand by consumers. The sample model we in- 
troduced in Section 1.2 was 


Consumer 
Supplies Industrial Demands Demand 
 Brerey:. x, = 4%, + 2x, + 2x5 +°.2k, + 100 
Constiict.; 2%) = .3%; + .3%, + .2xy + ci + SO 
Transport.: x, = .lx, + .lax + + 2Xy > 100 4) 
Steel: x, = # chkgt F harks 


or in matrix notation 


x = Dx + c 
where 
a G2 ee Le 100 
De ee eee Peo 50 
D = and c= 
A” kt. “eee 100 
hey oh - he 0 


Recall that there was an input constraint that the sum of the coefficients 
in each column of D be < |. This means that the sum norm of D (the sum 
norm is the largest column sum) is < I: 

[DI < 1 
We can rewrite x = Dx + cas x — Dx = cor 
(I —- D)x=ec (5) 
We can solve (5) algebraically with inverses to obtain 
x = (I -D)'e (6) 


We shall now give an algebraic formula for computing (I — D)~'. 


Lemma. Let A be a matrix such that ||Al]| < 1 (any matrix norm can be 
used). Then 
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co 


Q-—A)'= )A=1+A4A74+ A24+--- (7) 


k=0 


Here A° = I just the way for any scalar r, r° = 1. Formula (7) is 
simply the matrix form for the geometric series 


| co 
=Ya=ltateaet+at+::-: (8) 
l-— a k=0 


Recall that this series converges only when |a| < 1. The verification of 
(7) is similar to the way (8) was verified in high school—simply multiply 
1 — a times the infinite series and show that the product equals | 
[equals I in (7)]. 

Since ||D||, < 1, we can use (7) to compute (I — D)~!. Recall that 
|D¥||, = ||DIK, but ||D||k — 0 (since ||Dj|, < 1). Then the sum norm (the largest 
column sum) of D* approaches 0, so the individual entries of D* approach 
0. Thus we only need to calculate the sum > D* up to, say, the twentieth 
power of D—the remaining powers will be small enough to neglect. 

Using the sum > D* is not a very efficient way to compute (I — D)~' 
for most matrices. However, the formula is simple and easy to program. 
The method also has the advantage of avoiding roundoff-error problems: The 
iterated multiplications will not magnify possible errors in values of the 
coefficients; instead, the errors shrink, since the entries in D* all approach 
QO. Finally, this method guarantees that one can always solve a Leontief 
economic model for any D and any ¢ (provided that ||D||, < 1). 


Theorem 1. Every Leontief supply-demand model x = Dx + c has a 
solution of nonnegative production levels for every nonnegative ¢ and 
every nonnegative D, provided that ||Dj|, < 1. 


The nonnegativity is very important, since a negative solution is es- 
sentially no solution. Nonnegativity follows from (7): All entries in the 
powers D* will be = 0 (since all entries in D are = 0), so all entries in 
> D‘ = (I — D)~' are = 0. Also, ¢ = 0, so all entries in (I — D)~ 'e are 
= 0. 


Example 4. Solution of a Leontief Model 


We solve the Leontief model in (4) by using formula (7) to solve 
(I — D)x = c. We list below the first powers of D up to D’ plus D*°, 
and the sum of the right-hand side of (7) up to D’° (using a computer 


program). 
24 48... 14 iA 164 .13 .098 .094 
ee ee ae ge Ts Ses Pe Be 159 .126 .095 .09 
OF OT. 06° “03 ~ 1.055 .044 .031 .033 


04 .04 02 .03 03:., 6025 O19 O16 
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.114 
1] 
.O38 
O21 


O56 
O54 
O19 
.O10 


D° 


O91 
O88 
.O31 
O17 


044 
.042 
O15 
.008 


p2o = 


O00] 
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.0003 
.0003 
O00] 
.O00 1 


0001 


O80 
.O77 
027 
OS 


1.898 


.046 
.044 


(9) 


.664 
.644 
1.167 
18] 


.650 
330 
335 
1.087 


(10) 


The entries in powers of D are decreasing quickly enough so that the 

numbers in (10) are accurate to the three decimal places shown. 
With (10) we can now solve the Leontief model for x, the vector 

of the production levels for the four products. 


Z1SS « cob! .664  .650 |} 100 
‘ 1.056 1.898 .644 .530]] 50 
x = (I — D)"'c = 

Soe sto TIO? § 6.33 100 
14] 221 181 1.087 0 

2.183100 + .811X50+ .664x100 + .650x0 

¥ 1.056 Xx 100 + 1.89850 + .644x100 + .530x0 (11a) 
352X100 + .315X50 + 1.167100 + .335x0 
.141 x 100 .221X50 + .181x100 + 1.087 x0 


218.3 + 40.55 + 66.4 + 0 
105.6 + 94.9 + 64.4 + 0 
m2 WS1D F116. + 0 
14.1 + 11.05 + 


isa.1 + @ 


(1 1b) 
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325 | units of energy 


265 | units of construction 
= ' (lic) 
168 | units of transportation 


43 | units of steel 


The terms in (I — D)~! allow us to see how the consumer 
demand affects the interindustry demands. For example, the total de- 
mand for energy, the first sum in (11a), is 


2.183 xX 100 + .811 50 + .664x 100 + .650 x0 


This sum says that each of the 100 units of consumer demand for 
energy requires 2.183 units of energy to be produced, each of the 50 
units of consumer demand for construction requires .811 unit of energy 
to be produced, and so on. a 


If we do not need the inverse (I — D)~' (to solve the Leontief system 
for many different consumer vectors) but just want the solution for one 
specific ¢, we can shorten our effort by rewriting (11a) in terms of the powers 
of D. 


x= (I -— D)'c = (> D*)¢ => Dc (12) 
k k 


Computing the sum of D*c’s is faster than first computing the sum of D*’s 
and then multiplying by c: We compute the vector De; then by multiplying 
this vector by D we get D’c, then D°c, and so on, with each stage involving 
a matrix-vector product rather than matrix-matrix. It is left as an exercise 
for the reader to re-solve the Leontief problem using (12) (again stop at 
D?°c). 

There is another way that we can recast the solution of (I — D)x = 
c. The method is called solution by iteration. This is the method we used 
in Chapter | to get an approximate solution to the Leontief model. Iteration 
was also used to compute the stable distribution vector p* = [.1, .2, .2, 
.2, .2, .1] of the frog Markov chain in Section 1.3. There we repeatedly 
computed the next-state distribution p“ using the transition equations 


p® = Ap*-» 


and p“ converged to p*. Let us recall how iteration with the Leontief system 
worked. We use the system of equations 


x = Dx + c (13) 
We guess values x for the vector x, then substitute x in the right side 


of (13) and compute Dx + c. We check to see if Dx’ + e equals the 
left side x. If not, we set x‘' equal to 
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x()) _ Dx? +c 


Suppose that we ‘‘guess’’ x‘° = ¢; that is, we just produce enough to 
meet consumer demand. In our sample Leontief model in (4), ¢ is the vector 
[100, 50, 100, 0]. Let us compute Dx? + ce for this model, with x = ¢ 
= [100, 50, 100, O]. 


4x100 + .2x50 + .2x100 + 2x0 100 
oi ee .3x100 + .3x50 + .2x100 + .1Xx0 M 50 
.1X100 + .1X50 + + 2x0 100 
.1x50 + .1X100 () 
70 + 100 170 
165+ so] | 11s 
— aS 4 100 T 1S 
s+ 0 15 
This vector does not equal x“, so we set x‘) = [170, 115, 115, 5]. This 


new estimate of the production levels equals consumer demands c plus the 
interindustrial demands De to meet the consumer demand. 
We now compute Dx'' + e¢. 


4xX170 + .2x115 + .2x115 + .2x15 100 
ped 4 ¢ a | 3X170 + 3X15 + .2x115 + 1X15} | | 50 
1:41.70 9K VS - 4 2K 15 100 
PTS + e115 0 
117. + 100 217 
110 + SO} | 160 
31.5 + 100] | 131.5 
234 +200 23 


Then we continue with the iteration, and we set 
x = Dx +.¢ = [217; 160; 131.5, 23] 
and in general we set 
x4+)D = Px + © (14) 


In terms of a computer program, we are repeatedly performing the assign- 
ment statement 


x <- Dx + ce 


Every increase in production levels results in a further increase in the 
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interindustry demand Dx. The question is: Will this process converge to a 
solution x* such that x* = Dx* + c? In Chapter | we claimed that this 
iteration process did converge. Let us now give a theoretical justification for 


convergence. 
Consider three successive iterates .. , x’, x”, x”, . . . , where 
x = Dx +c and x’ = Dx’ +c 
Then 
x” — x” = (Dx” + c) — (Dx’ + c) = Dx’ — Dx’ = D(x" — x’) (15) 


Since ||D/|, < 1, taking norms in (15), we have 


x” — x"|, = |DDjfx" — x’, <x" - x’ 


S 


This means that D lessens the change in x“) from iteration to iteration and 
the change will eventually shrink to zero. 

If x* is the solution so that x* = Dx* + e¢ (Theorem | guarantees 
this solution exists), then replacing x” by x* in (15), we have 


x” — x* = (Dx’ + c) — (Dx* + c) = Dx’ — Dx* = D(x" — x*) (16) 


and 


" 


x” — x*], = |DhIx” — x*], < |x” — x*|, (17) 


so the iterates are getting closer and closer—that is, converging—to solu- 
tion x*. 

The following table gives the values we get in this iteration process 
(with the numbers rounded to integers). 


x = [100, 50, 100, 0] 

x? = [170, 115, 115, 15] 
x = (217, 160, 132, 23] 
x = [250, 192, 142, 29] 
x) = (273, 214, 150, 33] 
x = [300, 240, 159, 38] 
x® = [313, 253, 163, 41] 
x9 = [319, 259, 166, 42] 
x4 =) (322, 262, 167, 43] 
x? = (325, 265, 168, 43] 
x) = [325, 265, 168, 43] 


All further x”, n > 20, equal x°°°). This is the same answer that we obtained 
previously in (llc) using the geometric series approach. 
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Note that we started with a very poor estimate x’ = c. Suppose that 
we had started with the more thoughtful guess of 


x = [300, 200, 200, 50] 
Then iterating, we obtain 


x") = [310, 245, 160, 40] 
x™ = [313, 253, 164, 41] 
x) = [317, 256, 165, 42] 


The first iteration is already quite close to the correct solution and, after 
three iterations, all entries are only 2% away from the true solution. 

Now we show how this iterative approach is actually equivalent to the 
previous geometric series solution method. Recall x“ = ¢ and then 


x?) = Dx? +¢= Die) +e 
Then 


x? = Dx’? +¢= DiDe+ecr+e 
= D*c + Der+e 


Continuing, we find that 
x” = De + D' 'e +---+De+e= > De (18) 
k=0 


So this iterative method is just computing the partial sums in the geo- 
metric series for (I — D)~ 'e [see (12)]. 

A starting value x other than ¢ speeds, or slows, convergence but it 
cannot prevent convergence. We note that large real-world economic models 
are always solved by iterative methods, not by the elimination methods 
taught in standard mathematics books (the reason is that iteration goes 
quickly in real-world problems where the matrix D is mostly 0’s). 

We now ask the question: Can we adapt this iterative technique to 
solving general systems of equations? The following theorem tells how to 
convert a system of equations into a form similar to a Leontief system. 


Theorem 2. Given the system of equations Ax = b, let D = I — A, so 
A = I — D. Let the system be rewritten as 


(I — D)x = b or x = Dx + b (19) 
If ||D|| = |I — Al] < 1 (in any matrix norm), the iteration method 
x) _ Dx%- » + h (20) 


converges to the solution of Ax = b. 


Sec. 3.4 Solving Matrix Problems by Iteration eal 


to (I 


With this conversion, ||D|| < 1 guarantees the iteration (20) converges 
— D)~'b, the required solution. 


Example 5. Iteration Solution of an Oil 
Refinery Model 
We return to our oil refinery model. Each refinery produces three 


petroleum-based products, heating oil, diesel oil, and gasoline, and x; 
is the number of barrels of petroleum used by the /th refinery. 


20x, + 4x, + 4x, = 500 
lOx, + 14%, + 5x, = 850 (21) 
3X, + Sx, + 12x, = 1000 


Let A be the coefficient matrix in (21) and b be the vector of 
right-side demands. Theorem 2 does not apply to the system Ax = b, 
since our favorite norm, the sum norm, of D (= I — A) is 34 (the 
largest column sum). We want to rewrite the equations in (21) to make 
Theorem 2 apply. 

To make the column sums or row sums of I — A less than 1, 
we can divide each column (or row) of A by its largest entry. Dividing 
entries in the columns this way is equivalent to changing the units of 
the variables. That is, dividing the first column by 20 (its largest entry) 
is equivalent to replacing x, (the number of barrels of input to refinery 
1) by x} = 20x, (input measured in go of a barrel). 

For all three columns we have 


x, = 20x,, x, = 14x, x, = 12x, (22) 
and hence 
ee ee 
Ei e 2 As 2 


This change of variables divides the coefficients in the first column by 
20, the coefficients in the second column by 14, and the coefficients 
in the third column by 12. 


+ / + ’ 
xy + JA" + 23 —_ 500 
10 5 
70 *! dy 3 = 850 (23) 
5 5 
a2! + y4%2 + x; = 1000 


It is very important that in the new system the main-diagonal entries 
are all |. 
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Now we can try to use Theorem 2. If A’ is the new coefficient 


matrix in (23), the system for iteration x" = (I — A’)x’ + b, or 
/ 4 / 4 ; 
x, = 42 93 SOO 
= 10 / 5 / 
x, = 0 Xj 19 xy 850 (24) 
/ 5 t 5 / 
Xs = — 350 14 + 1000 


Note the nice form of (24): Each equation expresses one variable in 
terms of the other variables. The main-diagonal entries on the right 
side are 0 because the main-diagonal entries in (23) are 1. 

In the matrix of coefficients I — A’ on the right side of (24), the 
sum of the (absolute values) in each column is < 1. Thus |[I — A’|, 
< 1, so Theorem 2 guarantees that iteration based on (24) will con- 
verge. 

For simplicity we let x = Q. Iterating with (24), we get (num- 
bers are rounded to the nearest integer in this table) 


x‘? = [500, 850, 1000] 
x) = [—76, 183, 571] 
x) = (257, 650, 953] 
x = [=—3, 324, 704] 
x =-[173, 559, 757] 


(25) 
x9 = [84, 447, 797] 


x") = [106, 475, 819] 


x9 = [97, 463, 810] 
x7 = [98, 464, 810] 
x?2) = [~97.5, ~463.75, ~810] and no further change 


Observe how our iterates oscillate above and below the final solution. 
This is due to the minus signs in (24). The reader should try to interpret 
the iteration process in terms of an iterative method a refinery manager 
might use to try and find the correct operating levels for the three 
refineries. 

Converting x” back into our original variables, we have 
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2 oe. ae 

a ay 

_ ie. AO3.IS" 

“2 14 14 338 

be ae eee 

3 19 Saat. i 


In Example 2 we could also have rewritten the system of equation (21) 
by dividing each row (each equation) by its largest entry. This yields 


x, + 59 %2 + =x, = 25 
10 5 850 
Pee a eae Fie: a We (26) 
5 5 1000 


iach aa 2 © A3 = 12 


Rewriting (26) in the form x = (I — A”)x + b”, we have 


4 4 
= ——x, ——x,+ 
xy 70 X> 50 X3 25 
10 5 850 
=a — — 27 
Sou FAS 14%" 14 a 
et Fe eS 1000 
Pim Ga k winiees® 12 
Observe that the sum of the second row of coefficients in (27), | — ia] + 


|—74| is > 1, so the max norm is not < 1 as required. The first column 
sum is also greater than 1, so Theorem 2 does not apply to (27). 

Dividing the ith row by the coefficient of x;, as in (26)—(27), has the 
advantage that it does not involve a change of variables. In (27), we are 
simply solving the first equation of the original system for x, (in terms of 
the other variables), solving the second equation for x,, and solving the third 
equation for x,. For a general system of equations 


Qj4Xj + ayaky FY * + Gy, x) = D, 
Ay\X, + AyX, + *** + Ay,X, = D2 (28) 
ay\*| + a,2X3 Tad US) Any Xn a b,, 


the row equations for iteration become 
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= O).%, 7 0; 


- A5,Xp £5 b, 


(29) 


Iteration using this system is called Jacobi iteration. 


It can be proven that iteration scheme (29) derived from row division 
converges if and only if the iteration scheme derived from column division 
[as in (24)] converges. The following theorem states what conditions must 
hold for this iteration scheme to work, that is, conditions so that after row 
or column division the max or sum norm will be < 1. 


Theorem 3. Jacobi iteration using the system (29) converges if either of 
the following two conditions hold: 
(i) For each row 1, the coefficient a;; of x; is larger than the (absolute 
value) sum of the other coefficients in the row: 


> lay) < lau each i (30) 


J 
ix~j 


or 
(11) For each column j, the coefficient a,, of x; is larger than the (ab- 
solute value) sum of the other coefficients in the column: 


» la;| < |a;, each j (31) 


i 
ix~j 


The reader should check that condition (31) was satisfied in the refinery 
problem. It is a straightforward exercise to check that (31) guarantees that 
after column division [as in (23)—(24)] the resulting matrix D” = I — A” 
will have sum norm < 1, and that (30) guarantees that after row division 
the max norm of D'’ = I — A’ is < 1. 

There are other iteration methods based on more advanced theory (see 
numerical analysis references). Exercise 17 mentions a simple way, called 
Gauss-Seidel iteration, to speed up the convergence of Jacobi iteration. 

We conclude this section by linking the iteration, x“) = Ax~!, for 
the dominant eigenvector at the start of this section with the iteration in 
Theorem 2, x“ = Dx“~! + b, for solving the system (I — D)x = b. 
The following trick converts the latter iteration into the former. 

Define the (n + 1)-vector x* and the (n + 1)-by-(n + 1) matrix D*: 
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Ree MY Dy) Xap dso! 5-XG/1] (32) 
D b 

Dt = 
he 


For example, for the Leontief matrix D in (4), D* is 


fete SS He 
sy ey 2 OCR 50 
oe sak. lk ee ee Oe 
et eel age | () 
Ov Bi. DB. 9 l 
The reader should check that 
x*©) = D*tx*e-h is equivalentto x“ = Dx*~ + b 


Then the iteration scheme x“ = Dx“*~" + b of Theorem 2 will converge 
to a solution in which for large k, 


if and only if for large k, 


But the latter condition for the x*’s means that the dominant eigenvalue of 
D* is 1. 


Theorem 4. The iteration scheme x“ = Dx*~" + b for solving 
(I — D)x = b converges to a solution if and only if the dominant 
eigenvalue of the augmented matrix D* [see (32)] is 1. 


Section 3.4 Exercises 


Summary of Exercises 

Exercises 1—5 involve iterative methods for determining the largest eigen- 
value and its eigenvector. Exercises 6—9 involve solutions by sum of powers. 
Exercises 10—15 deal with iterative methods to solve a system of equations. 
Exercises 16 and 17 introduce related iterative methods, and Exercise 16 
introduces the Gauss—Seidel iteration. 


1. Use iteration to determine the dominant eigenvalue and an associated 
eigenvector for the following systems of equations. 


bbl ae i a i's =* a ol 
Tider ih lee th aoe ea | =1 1 2 


oat a a 
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Use [1, 2] or [1, 2, O] as your starting vector. This means that for part 
(a), you iterate the system 


x, = Ixy, + 1% 
QO, 4 2a, 


x} 


Use the Raleigh quotient to refine your estimate of the dominant eigen- 
value. 


Repeat the iteration in Exercise | using the vector [1, 1] or [1, 1, 1] as 
a starting vector. How does this affect the speed of convergence to the 
dominant eigenvector? For one of the matrices, you do not converge to 
the dominant eigenvector—why? 


(a) Use iteration to determine the dominant eigenvalue and an associ- 
ated eigenvector for the following system of equations. Use [1, 0] 
as your Starting vector. 


x’ = .707x — .707Ty 
y’ = .707x + .707y 


(b) Plot the successive iterates on x-y graph paper. Try other starting 
vectors. State in words the effect in x-y coordinates of this linear 
model. 

(c) Solve the characteristic equation det(A — AI) = O to determine 
the eigenvalues for this matrix of coefficients. You are finding out 
that imaginary eigenvalues correspond to rotations. Note that .707 
= sin 45° = cos 45°. 


(a) Use iteration to determine the dominant eigenvalue and an associ- 
ated eigenvector for the following system of equations. Use [1, 1] 
as your starting vector. 


x= 2x— y 
y = 3x — 2y 


(b) Repeat the iteration starting with [0, 1]. 

(c) Solve the characteristic equation det(A — AI) = O to determine 
the eigenvalues for this matrix of coefficients. Does this give you 
any hints about what was wrong in the iteration procedure in 
part (b)? 


(a) Use iteration to determine the dominant eigenvalue and an as- 
sociated eigenvector for the following system of equations. Use 
[1, 0, O] as your starting vector. You have to iterate a long time 
to get the iterates to stabilize at an eigenvector. 
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~ 


+ ay + 2 
= 4x 


So ae 


~~ 


.6y 


N 


(This is a population growth that we will study in Section 4.5; here 
x is number of babies, y number of adolescents, and z adults.) 

(b) The other two eigenvalues of this system are — 1.118 and — .152. 
How do these eigenvalues help explain the slow convergence of 
the iteration procedure? 


Hint: See equation (2). 


’ 6. Suppose that we want to solve the same Leontief system as was solved 
in Example 4, but now the consumer demand vector c has been changed. 
Use the formula in equation (11a) with the following new c’s to deter- 
mine the new vector x of production levels. 
(a) c = [50, 50, 50, 100) (b) c = [0, 100, 0, OJ 
(c) ¢c = [0, 0, 100, O} (d) c = [0, 0, 50, 50] 
(e) c = [100, 10, 10, 100] 


7. Use the sum-of-powers method in equations (10)—(11) to solve the 
following Leontief systems. 


(a) x). = 42, + 2% + .2% +1100 
X, = .2x, + .1x, + 100 
Xz = .2x, + .ix,. + 100 


(b) x, = .3x, + .lxe + .2a, + 100 
Xo = Ry Site Ha F100 
X3 = lx, + Lx; + 100 


Use computer programs for both systems; convergence is fast because 
the norm of D is small—you only need to go up to the sixth power of 
the coefficient matrix D. 


8. Find the inverse of the following matrices by writing them in the form 
I — D and using the sum-of-powers method on D. Check the accuracy 
of your answer by using the determinant-based formula for the inverse 
of a 2-by-2 matrix (see Section 3.3). 


wie 6 3 6 0 
) De “ ©) M < “ ° | 


9. Try using the sum-of-powers method to solve the following system of 
equations. Why does the method fail? 


xX, = .4x, + .3x, + .4x, + 100 
3X, + .4x, + .6x, + 100 
1%, + 8x +..5x%_ 4+ 100 


X> 


X3 


234 


10. 


11. 


12. 


13. 


14. 


Ch. 3 Solving Systems of Linear Equations 


Use the iteration method in equation (14) to solve the Leontief systems 
in Exercise 7. — 


Consider the following systems of equations. 

(i) 1X, + 2% + 2x, = 30 (ii) 6x, + 3x + xy IS 
Xt Sxz + 3x, = 10 23% Oty Sy SG 
2 F Sis F Bx, = 12 X, +t XxX, + 4x; = 10 


(a) Use the formulation in (29) to rewrite the systems in the form 
x = Dx + b with ||D|| < 1. 

(b) Solve this system by iteration as described in Theorem 2, Starting 
with x = [0, 0, 0}. 

(c) Repeat part (b) with starting vector x = [100, 100, 100). 


In the two systems of equations in Exercise 11, divide each column by 
the main-diagonal entry and rewrite as x’ = Dx’ + b, as done in 
Example 5. Then solve the systems by iteration, starting with x = 
[O, 0, OJ. Are the iterates the same as in Exercise 1] (allowing for the 
changes of variable)? 


Consider the system of equations Ax = b, where 
4 -—3 
A =| -3 Sw hl 
+ et 


and b = [10, 20, 30] 

(a) Rearrange the equations (rows) and divide each equation by appro- 
priate numbers so that this system can be rewritten in the form 
x = Dx + b with ||D|| < 1. 

(b) Solve this system by iteration as described in Theorem 2. 


(a) For which of the following systems of equations does Theorem 3 
apply [is (30) or (31) satisfied]? 
(i) 3x; — 445,= 2 (il))6%) f+ 2x9 — x, =- 4 
2%, + X = 4 yo 3 +. 3 
oy Ky + 4, = 27 


(iii) 2x, + x = 3 
Ax, = X> — 5 
(b) In the systems where Theorem 3 does not apply directly, try to 


rearrange the rows and/or divide the rows or columns by the largest 
coefficient to make Theorem 3 apply. 
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15. 


16. 


17. 


(c) Try the iterative method for solving each system. Does the method 
work on a system where Theorem 3 could not be made to apply? 


(a) Suppose that we use a starting vector x = w in the iteration 
scheme x**) = Dx + ¢. Using the same reasoning as led to 
equation (18), find formulas for x“, x®, and x®” in terms of D 
and w. 

(b) Use your formula for x” in part (a) and the fact that ||D|] < 1 to 
show that the starting vector does not influence the final values in 
the iteration process. 


A well-known method to speed up the convergence of Jacobi iteration, 
called Gauss—Seidel iteration, is to use the new value of x, obtained 
from the first equation in the second and third equations (in place of 
the previous value of x,); similarly, the new value for x, is used in the 


third equation. In the refinery problem, the first two equations in (24) 
are 


x, = —35x5 — 30x; + 500 


x, = —T4x — fax, + 750 


Starting with x = [0, 0, 0], we would compute x} as zo(0) — 35(0) 

+ 500 = 500. Then we use this value of 500 for x, in the second 

equation to compute x, as 39(500) — 39(0) + 750 = 550. The third 

equation would use the values for both x, and x, just computed. 

(a) Use Gauss-Seidel iteration on the refinery problem [the three equa- 
tions in (24)] starting with x® = [0, 0, 0]. How many iterations 
are required to attain the solution vector [135, 263, 868]? 

(b) Use Gauss-Seidel iteration to solve the Leontief system in Exam- 
ple 4. 

(c) Use Gauss—Seidel iteration to solve the system of equations in Ex- 
ercise 1], part (1). 


Another method of iteration is to average the two previous iterates. In 

an iteration process such as (25), where the iterates are oscillating above 

and below the final solution, such an averaging will increase conver- 

gence. On the other hand, when the iterates are increasing as in the 

Leontief system, this averaging method will slow down the conver- 

gence. 

(a) Use this method of averaging the two previous iterates to re-solve 
the refinery equations [equations (24)] starting again with x = 
[O, 0, 0]. How many iterations are required to attain the solution 
vector [135, 263, 868]? 

(b) Use the method of averaging to re-solve the Leontief system in the 
text. How many iterations are required to attain the solution? 
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Numerical Analysis of 
Systems of Equations 


Computational Complexity of Solving 
Systems of Linear Equations 


In this section we look at some of the numerical difficulties and shortcuts 
that are possible during elimination computations. Numerical linear algebra 
is a large, growing field (see the References). We shall just touch on some 
of the basic results. Our discussion will concern systems of n equations in 
n unknowns, but generally our results also apply to systems of m equations 
in nm unknowns. 

The first issue we address is the computational complexity of elimi- 
nation: How many arithmetic operations are required to solve a system of n 
equations in n unknowns (when the answer is unique)? We shall measure 
computation in terms of the number of multiplications required (division will 
be treated as equivalent to multiplication); the number of additions and sub- 
tractions is always about the same as the number of multiplications. 

The fundamental computation in Gaussian elimination is subtracting a 
multiple /,; of row i from row k, k = i + 1,i + 2,... , m: this operation 
makes entry (k, i) zero. Each entry in row i must be multiplied by /,; and 
then subtracted from the corresponding entry in row k, for k > i. There are 
n — i + 1 entries in row i (the entries to the right of the main diagonal 
plus the right side value) involved, and n — i rows below row i. So there 
are approximately (n — i + 1)(n — i) multiplications; for simplicity, we 
say about (m — i)* multiplications. To perform elimination of x,, x5, . . 
X,—, Tequires 


Pho 


3 
(n-— 1)? + (nm -—- 27% +°-++ y= = multiplications 


When elimination is finished, it takes one division to compute x,,, one 
multiplication and one subtraction (and one division) to compute x,,_ ,, and 
generally k multiplications and k subtractions to compute x,,_,. Altogether, 
back substitution requires about n?/2 multiplications. For large n, n*/2 is 
negligible beside n°/3. . 

Now let us quickly go over the operation count for elimination by 
pivoting. The one difference is that a variable x; is now eliminated from all 
the other m — 1 rows; in addition, every entry in the pivot row is divided 
by the pivot entry. It takes (n — i + 1)n multiplications to eliminate x, 
from all other rows. Summing over all i, we get a grand total of about n*/2 
multiplications. There is no back substitution. 

If we want to compute the inverse by pivoting, then each row will 
require (x — i + n) multiplications, for n (instead of 1) right-side terms. 
The total number of multiplications works out to about n°. One can use 
Gaussian elimination with n right-hand sides and multiple back substitution, 
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but the back substitution now is m times more complicated; the total process 
also requires about n° multiplications. 
We summarize our discussion with a theorem. 


Theorem I, A system of n equations in n unknowns requires approximately 
n?/3 multiplications (and subtractions) to solve by Gaussian elimina- 
tion and n°/2 multiplications to solve by pivoting. Either method re- 
quires approximately n° multiplications to invert an n-by-n matrix. 


If-we are solving Ax = b for several different right-hand sides, then 
the best method is the LU method, presented in Section 3.2, of storing 
elimination multipliers in L and the final reduced matrix in U. Applying the 
multipliers in L to a new b* will require about n*/2 multiplications. As 
noted above, back substitution also requires n*/2 multiplications. Thus, us- 
ing L and U, we can solve the new system Ax = b* with just n* multipli- 
cations. This is the same number of multiplications required to compute the 
matrix-vector product A~ 'b*, the solution using the inverse (assuming that 
A~' is known). However, the result with L and U will have less roundoff 
error: Using A~' necessarily introduces some additional error. 

In Section 3.4 we introduced the Jacobi iteration method for solving 
a system of equations. Each iteration requires a matrix-vector multiplication 
that takes n? operations. If the matrix is sparse, fewer operations are required 
(see Section 2.6). The problem is that we do not know how many itera- 
tions will be necessary to converge to the solution. When n is large 


and the coefficient matrix is sparse, an iteration method is likely to be 
much faster. 


Solving Tridiagonal Systems 


Let us next consider how much more quickly elimination can be performed 
for a well-structured sparse matrix. In particular, let us look at a triadiagonal 
matrix, whose only nonzero.entries are on the main diagonal and just to the 
left and right of the main diagonal. 

Look at the form of a tridiagonal matrix before and after the elimination 
step for x,;. A * indicates a nonzero entry. 


SP ££] 

"A 2a 

Gee © 
B 0 0 
E 
rm) wd 0 0 a; * 0 0 
O i+ O * +. £6.90 
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The only alteration was that entry (i + 1, i) became 0, and entry 
(i + 1,i + 1), marked with a C, changed to a new nonzero value—only 
entry C’s new value must be computed, together with a change in the right 
side of row i + 1. This requires only two multiplications and two subtrac- 
tions, plus one division to find the elimination multiplier. 

Back substitution for each row in the reduced system will require one 
multiplication, one subtraction, and one division, since each row in the 
reduced system looks like gx; + rx;,, = s (where x;,., is already known), 
so x; = (s — rx,;,,)/q. Together, we have 


Theorem 2. An n-by-n tridiagonal system of equations can be solved by 
Gaussian elimination in just 5” multiplications and 3n subtractions. 


This is an incredibly fast result. Compared with the normal n°/3 mul- 
tiplications in Theorem |, this means that solving a 50-by-50 tridiagonal 
matrix requires about 250 multiplications versus over 40,000 operations for 
a full 50-by-50 matrix. Savings are possible on any band matrix. 

Let us illustrate the speed of elimination on a tridiagonal matrix we 
have seen frequently in this book. 


Example 1. Computing the Stable Probability 
Distribution for the Frog 
Markov Chain 


We return to the familiar frog Markov chain that has six states (rep- 
resenting different positions in the highway). The transition matrix is 


(1) 
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By letting this Markov chain run for many iterations, we found 
in Section 1.3 that the probability distribution approached a stable 
distribution p* with the property Ap* = p*. In matrix algebra ter- 
minology, p* is an eigenvector of A with eigenvalue 1. Let us solve 
the matrix system 


Ap =p or, equivalently, (A — Dp = 0 
That is, 


— .50p, + .25p, = 
5Op, — .50p, + .25p, = 
p> — 0p; + .25p, = 

25p, = .30p, + .25p; = 

2p, ~— Ops + .S0D, 

29ps — .d0D. 


(2) 


ao oo oOo 2 2a @ 


Use Gaussian elimination. To eliminate p, from equation (2), we add 
the first equation in (2) to the second and obtain 


— 0p, + .25p, = 0 
—(ZOp> + 23D = 0 

25p> — .Op, + .25p, = 0 3) 
25p3 — 0p, + .25ps = 0 


29) - 0p. — 


—) 


To eliminate p, from the third equation in (3), we add the second 
equation to the third. © 


= 0p, + Zp; = 
= 2p» + .29D% = 
— .25p, + .23p, = 
2p; — 0p, + .25ps 
2959p, — Ops + S5Op, = 
25p5 — 0p, = 


(4) 


Oo © 2 GOGO 


A simple pattern is emerging of simply adding the ith equation to the 
(i + 1)st equation. To eliminate p, from the fourth equation in (4), 
we add the third equation to the fourth and after that we eliminate p, 
from the fifth equation by adding the fourth equation to the fifth. After 
these two further elimination steps, we have 
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— .50p, = 25) = Q 
— 25p, + .25p, = 0 

=—.25p, + .25p,4 = 0 5) 
= Z29p_ ¥ <29Ds5 = 0 


.25p; — .50p, = 0 
Now however, we are on the verge of a problem we had back in Section 
3.2.: The last equation in (5) is just the negative of the fifth equation. 


When we add the fifth equation to the sixth equation to eliminate p,, 
we obtain 


a 0p; + .LIP> =< 
= .25p5 Ss 25); = 
= .25)3 “fs .29P4 — 


0 
0 
0 
=195p, 4+ 25n, =o 
0 
0 


Multiplying all equations in (6) by 4 and bringing one term in each 
equation to the right side, we have the simple system 


2p, = Pa, P2 = P3, 
P3 = Pa, P4 = Ps: (7) 
Ps = 26, 0 = 0 


Our problem is that an eigenvector is really a family of eigenvectors: 
Any multiple of an eigenvector is again an eigenvector. We can give 
any value gq to p,, then from (7), 2p, = p> implies that p, = 2q and 
further that p; = py = p3 = Pp» = 2q. Then 


p = lq, 24, 2q, 2q, 24, 4] (8) 
But we want a special eigenvector, one that is a probability dis- 
tribution—whose entries sum to |. Requiring that the components in 


p sum to 1, we have the constraint 


q + 2q + 2g + 2g + 2g+q= 1 or 10g = | 
—>q-.l 


Thus our stable distribution is 
ey ee PU 225 hay ky Seed 


This is the same result we got by iteration in Section 1.3. a 
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The simple nature of the elimination process in (2)—(6) shows that if 
the frog were on a 10-lane superhighway yielding a 12-by-12 transition 
matrix, the computations to find the stable distribution would still be easy 
(the n-lane problem is solved in Section 4.4). Tridiagonal systems arising 
from Markov chains and other real-world problems often have a simple 
elimination pattern. For example, in Section 4.7 we shall easily solve a 100- 
by-100 tridiagonal system. 

One of the dangers in elimination in general sparse matrices (not band 
matrices) is fill-in. Fill-in is the creation of new nonzero entries during the 
elimination process, the loss of sparseness. Every nonzero entry created 
below the main diagonal will require additional computation to eliminate it 
later. In elimination by pivoting, nonzero entries above the main diagonal 
also cause extra work. To illustrate the trickiness of fill-in, observe what 
happens in elimination by pivoting when we re-solve the stable probability 
distribution in Example 1. 


Example 2. Sparse Matrix Fill-in 


After pivoting on entry (1, 1), we have the same result as in (2) except 
that the first row is [1, —.25, 0, 0, 0, O], since we divide the pivot 
row by the pivot entry. After pivoting on entry (2, 2), we obtain 


(a) P, = — _.50p3 = 0 
(b) Pa)" oR = 0 
(c) — 204 2D = 0 (9) 
(d) 2p, — 0p, + .25ps = 0 
(e) 25p4 — 0p, + .SOp, = 0 
(f) 2IPs — 0p, = 0 


When we pivot on entry (3, 3), we have to remove the nonzeros in 
entries (1, 3) and (2, 3). The result is 


(a’) = (a) + .5(c’) p, — .S0p, ='0 

(b’) = (b) + (c’) Pye =) De =i) 

(c’) = (c)/(—.25) Ps Pe =o (10) 
(d’) = (d) — 25p, + .25p. =) ~ 
(e’) = (e) .25p, — -SOps + .SOp, = 0 

(") = &) .25p; — .0p, = 0 


We just pushed the nonzero entries over one column and now will 
have to deal with them on the next pivot. So when it is time to pivot 
on entry (i, i) in this system, the entries above the main diagonal in 
column / will all be nonzero. When we used Gaussian elimination in 
Example 1, we never had such problems. i 
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Stable Elimination 


We consider now the problem of the stability of computations during elimi- 
nation. Will roundoff errors tend to grow and make the solution computed 
inaccurate, or will the errors stay small? The answer is that some systems 
of equations are inherently unstable, while others are very dependent on the 
order of the equations; that is, reordering the equations and variables can 
sometimes greatly reduce roundoff error. We shall discuss ways to choose 
a good arrangement and to estimate the underlying stability of the system 
of equations. 

To see how computations with systems of equations can be stable or 
unstable depending on the order of the equations or variables, consider the 
following example. 


Example 3. Roundoff Error in 
Elimination Computations 


Gaussian elimination on the system 


000lx + y = 
a ¥ 


— 


(11) 


| 
No 


yields 


000lx + y = l (12). 
—9999y = —9998 


from which we have y = .9999, so back substitution yields 
OOOIx + .9999 = 1—~x=]1 
But suppose that roundoff error in the elimination had produced 


OOOLx + y= l (13) 
—10,000y = — 10,000 


yielding y = 1. Now back substitution gives 
O00lx + 1=1—-~x=0 


Although the y value stays about the same, the difference in x values 
is very significant: (12) yields x = 1, and (13) yields x = 0. 

The problem came from the small size .0001 of a coefficient of 
x in (11). This coefficient is the pivot entry in elimination by pivoting. 
The antidote is to avoid pivoting on small entries. In the case of system 
(11), we should pivot on the coefficient of x in the second equation or 
on the coefficient of y in the first equation. In Gaussian elimination 
terms, we should interchange either the equations or the variables. 
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Interchanging equations, we get 


x+y=2 (14) 
OOOLx + y 


| 
— 


and Gaussian elimination gives 


x + y= 2 (15) 
.9999y .9999 


yielding y = .9999 and x = 1.0001. Now there are no roundoff-error 
problems: A small error in the computations in (15) will yield only a 
small error in the values of x and y. 

Interchanging the order of the variables in (11), we get 


y + .OOOlx = | (16) 
yt x=2 


Now Gaussian elimination gives 


y + .OOOIx 
.9999x 


(17) 
l 


again yielding x = 1.0001 and y = .9999. Again, a small error in 
computations in (17) has a small effect on x and y. 
What a difference the rearrangements make! a 


The immediate conclusion would seem to be to pick an entry that is 
largest in its row and column for the pivot, and perform an exchange of 
equations and/or variables to get this entry up to the first coefficient in the 
first equations. 

Unfortunately, the situation is more complicated than that. Suppose 
that we multiply the first equation of (11) by 10* and multiply the second 
equation by 10~*. Further, let us replace x by s = 10°*x and y by 
t = 10*y (sox = 10*s, y = 10~“t). These scaling transformations convert 


(11) into 
10*{.0001(10*s) + (10-47) = 1} 
10-*{(10*s) + (10-44) = 2} 
OF 
10*5 + j= 10° (18) 
s + 10-* = 2x10-* 


The coefficient of s in the first equation is now the largest coefficient in 
(18), but these scaling changes have not really changed the arithmetic. If 
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we solve (18) with possible roundoff error in the fourth significant digit, as 
above in (13), the solution of (18) will be 


s=0, t= 10 > x= 10s =0, y=10°% = 1 (19) 


the same error as before. 

To undo the confusion caused by (18), it is important before making 
any pivot choices (i.e., rearrangement of equations or variables) to scale the 
coefficient matrix: Multiply equations and rescale variables by constants 
chosen to make the largest entry in each equation and each column the same 
size, say, equal to 1. Then pick an entry that is largest in its row and column 
for the pivot. After eliminating one variable from the other equations, repeat 
this process for the remaining nm — 1 equations inn — 1 unknowns; and so 
on for each successive choice of pivot entry. 


hs ee eee ee 
Example 4. Stable Elimination 
Let us apply the preceding advice to system (18). 


10*s + r= (FF (18) 
got 10-3 = 2x I107* 


We divide the first equation by 10* to obtain 


s + 10-44 = | (20) 
g + 107% = 2x i1)-* 


The largest entry in each equation is 1, but the column of coefficients 
for t are all too small. Let us replace t by y = 10~4t. Then (20) 
becomes 


s + +> l (21) 
s + .000ly = .0002 


Now we could pick y in the first equation or s in the second. Let us 
pivot on the y term in the first equation. Interchanging the order of the 
variables, we have 


yts= |] (22) 
OOOly + s = .0002 
Gaussian elimination gives 
y + s= |] (23) 
.9999s = .0001 


yielding s =~ .0001 and y = .9999. Recall that x = 10*s, sox = 1. 
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The reader should check that small errors in (23) lead to a small change 
in the answer. 3 


We repeat the rule of stable elimination. This rule is called complete 
pivoting in the numerical analysis literature. 


Rule of Stable Elimination. First apply scaling to rows and columns 
as necessary to make the largest entry in each row and column equal 
to 1. 


An entry that is the largest (in absolute value) entry in its row 


and column should be chosen for the pivot. Interchange equations and 
variables to make the pivot the first entry in the first equation. Now 
eliminate the first variable from the remaining equations. 

Repeat this whole process for each round of elimination. 


Let us look next at the question of inaccurate solutions from the view 
of the ““person on the street’’ who needs to solve a system of equations. He 
or she will probably not worry about inaccuracy until it happens. So the key 
question 1s: How do you know if the solution x* that you computed to the 
system of equations Ax = b is accurate? The answer is simple. If x* were 
the true solution, Ax* would equal b. A simple measure of error is the 
vector &: 


¢ = Ax* — b (or Ax* = b + €) (24) 


A true solution makes € equal 0. If © is unacceptably large, one should re- 
solve the system from scratch using a stable elimination method just given. 


Optional 


Maybe you already used this method, but the system was inherently unstable. 
Then the best way to proceed is to correct x* as follows. If x° is the true 
solution, our error in x*, e = x* — x°, satisfies the equation 


Ae = A(x* — x°) = Ax* — Ax’? 
(b+ €) — b (25) 


e 


So we should solve the equation Ae = e (by the stable elimination 
method) and subtract our solution e* from x* to get a corrected solution 
x** = x* — e* for the original system. 
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Condition Number of a Matrix 


Now we turn to the study of inherently unstable systems. We saw a case of 


instability back in Section |.2 with the canoe-with-sail equations, which had 
the form 


x, + kx, = D, (26) 
Xx; + X45 — b, 


When k became close to 1, these two equations represented almost 
parallel lines. A small change in the value of k near 1 has a large effect on 
where these lines will intersect. The choice of pivots is not at issue here. 

In Section 3.1 we obtained Cramer’s rule, a determinant-based for- 
mula, for the solution to a system of equations. This formula involves di- 
vision by the determinant of the coefficient matrix. For the coefficient matrix 
A in (26), det(A) = 1-1—k:-1=1-—k. Ask—1, det(A) — 0. So 
in Cramer’s rule we are almost dividing by 0 and problems will abound. 

The critical issue is not just the size of the det(A) but the size of 
det(A) relative to the size of the entries in A (which are used in the numerator 
in Cramer’s rule). 

A more rigorous analysis of errors needs to use the norm of the coef- 
ficient matrix A. Recall that 


here |x| and |Ax| are the sizes of these vectors measured by some norm (the 
euclidean norm, the sum norm, or the max norm, introduced in Section 2.5). 
|All is the maximum magnifying effect that matrix multiplication can have 
on a vector. Recall that the sum norm ||Al|_ is simply the largest column sum 
and the max norm |All, is the largest row sum. In both norms, the sums 
are of the absolute values of the entries. 

For any vector x, 


||Ax|] = |All] - |x| : (27) 


Suppose that E represents a matrix of errors (either in recording data 
or roundoff errors): The true matrix A has become the matrix A + E. Then 
let us see how changing A to A + E changes the solution to the matrix 
equation Ax = b. Suppose that x is the solution to the correct equation and 
x + e represents the solution to the altered equation. Thus we have 


Ax = b and (A + E)(x + e) = bD (28) 


We now derive a bound on the relative size of e in terms of the relative 
size of E, a bound of the form 
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= c(A) —; (29) 


We derive (29) by subtracting the first equation in (28) from the second 
to obtain 


(30a) 


or 
Ae = —E(x + e) (30b) 


We assume that A is invertible (or else the solution is not unique). Then we 
can solve (30b) for e. 


e = —A'E(x + e) 7 (31) 
Taking norms in (31) and using (27), we have 
lel] = |A~'E(x + e)| = ||A~'E]|- |x + el (32) 


Now we use the fact given in Section 2.5 that for any matrices A, B: 
||AB|| = ||Al] - ||BI]. With it, we have 


|A~"E]| = ||A~ "ll - | (33) 
Combining (33) with (32), we have 
le| = |A~"E| - |x + e| = |A~'] - [EI - Ix + | (34) 
Dividing by |x + e| yields the bound we were seeking. 


le! 


= |A~'||- jE 
c+ qe (35) 
Equation (35) can be rewritten as 
co. ee (Ell 
(IA~ "IT HAD (36) 
Ix + e| Al 
So the constant in (29) turns out to be c(A) = ||A~'|| - ||Al|. This product 


c(A) is called the condition number of the matrix A. A small condition 
number means that the matrix is well behaved and yields stable computations 
during elimination, since by (36) small errors in A can only produce small 
errors in the solution vector. 

The condition number can also be shown to bound the effects on x of 
an error in b, when A(x + e) = b + e’. 
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le e’| 
—- = c(A) — 37 
x| Ib ie 


The exact value of the condition number is dependent on which matrix 
norm we use. 


ae bt — 


5. Condition Number of 
Canoe-with-Sail Problem 


stake ox 


Bebe * 


Example 


we yD 
Ste Sse 4 


The canoe-with-sail equations in (26) have the coefficient matrix 


Mb he 
nef 4 20 


We want to study the sensitivity of a solution x of Ax = b to 
small errors in the value of k when k is close to 1. From the preceding 
discussion, we compute the condition number c(A) = ||A]]//A~'I],, 
using the sum matrix norm. The sum norm of A is the largest column 
sum (remember absolute values). When k is close to 1, the sum of 
each column is 2 or about 2; so we say that ||A||, = 2. Computing the 
inverse of A (by the determinant-based inverse formula for a 2-by-2 
matrix in Section 3.1), we have 


AU! = (39) 


Again with k close to 1, the sum of (the absolute values) of each 
column in A~' is about 2/(1 — k). So let |A~ "||, = 2/(1 — &). Then 


2 4 
—- ” = — a — 
cA) = [A WA =2-—= === 40) 


For k = .75, c(A) = 4/(1 — .75) = 16, so (where |All]. = 2) 


¢ E E 
Bh = cay I = 16 Et — ape a) 


Thus a small error of, say 7s in the value of k, with |IE|, = 7: 
(the error matrix E for A is all 0’s except for 7 in k’s entry) could 
lead to a large percentage error in the solution x + e of up to 
8(zz) = 67%. Recall that back in Section 1.2, changing k from 
+ to % changed the solution of Ax = [5, 7] from x = [—1, 8] 
to x’ (= x + e) = [l, 6]. So in this case, |el,/|x + el, = 
(2 + 2)/(1 + 6) = 57%. # 
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For a geometric picture of why matrices with rows (or columns) that 
are almost equal are ill-conditioned (unstable) (see Exercise 4 of Section 
be 


PLA TIER 
Example 6. A Well-Conditioned System 


Consider the refinery system of equations introduced in Section 1.2 
whose coefficient matrix is 


200 4 4 
A=1]10 14 5 
7 iva “he 


In Example 5 of Section 3.3, we computed its inverse to be 


05958 —.01166 —.015 
A-' = | —.03958 09167 —.025 
00853 —.03533 J 


Taking the maximum (absolute value) of the column sums, we have 


|All, = 35 (first column) and |/A~'||. = .14 (third column). Thus the 
condition number of A is 


c(A) = |/AI JA", = 35 x .14 = 4.9 


This is a reasonably well conditioned matrix. For the demand vector 
b = [500, 850, 1000] we have used in this refinery model, we found 
(in Section 3.2) that the solution x of Ax = b was 


x, = 4h 9 wee SBR OX, = 6F8 


If we change entry (2, 3) of A from 5 to 7 to get A’, the error matrix 
E [a matrix of all 0’s except entry (2, 3) is 2] has ||E]|, = 2. The error 
bound (35) gives 


Ell, 2 
- = 4.9— = .28 (42) 
|All, 


as: 
So a gs change in the norm of A can yield a 28% error in the norm of 
the solution. Solving A’x’ = b for the same b, one would obtain 


—_— 
Ix + el. cA) 


x, = 6.5 x, = 19.7 x; = 72.3 


Let x’ = x + e, where |x’| = |x + e| = 98.5; then 


e=x — x = (6.5 — 4.8, 19.7 — 33.2, 72.3 — 67.5] 
= [1.7, — 13.5, 4.8] 
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with |e], = 20. Thus |e|,/|x + el], = 20/98.5 = .20, which is not far 
from the maximum percentage error of .28 given in (42). B 


In the Exercises the reader has the chance to examine how the solution 


to Ax = b is affected by small changes in a coefficient of A, for a variety 
of well-conditioned and ill-conditioned matrices. 


Section 3.5 Exercises 


Summary of Exercises 

Exercises 1-12 concern the speed of elimination computations and elimi- 
nation on tridiagonal matrices. Exercises 13—17 involve choice of pivots. 
Exercises 18-27 deal with the condition number of a matrix; it is assumed 
that the sum norm is being used. (Note that the word problems in Exercises 


24—26 use inverses that would have been computed in Exercises 15—17 of 
Section 3.3.) 


1. Let A be an 8-by-8 matrix. How many multiplications (approximately) 
are required to perform each of the following operations? 
(a) Compute A’. (b) Solve Ax = b. (c) Compute A~'. 


2.. Let A be an n-by-z matrix. How many multiplications (approximately) 
are required to perform each of the following operations? 
(a) Compute A?°. (b) Solve Ax = b. 
(c) Iterate x**" = Ax 10 times. | (d) Compute A™'. 


3. Let A be a 200-by-200 tridiagonal matrix. How large must k be so that 
squaring a k-by-k matrix takes as many multiplications as solving 
Ax = b? 


4. Solve the following tridiagonal systems of equations. 


(ati * my iy 
a hes aie 2 ee = = QO 
Seer wae ee = —| 
Xa Pek = Xs l 
Xe te = Ae of 
—X5 t+ WX = -2 

(b) x, + X = 


po oe A, eee = 
ye, OE ay ee 

3Xy te 2kgt Xs 

2kg + 3X5 + Xe = 

2x5 + 3x, = —4 


SoS © & ey. 
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5. 


10. 


11. 


Repeat Exercise 4, part (a) with the last equation changed to —x, + 
X, = —2. You now get a set of solutions. Express these solutions in 
terms of X¢. 


. Find the stable distribution (as done in Example 1) for Markov chains 


with the following transition matrices. 


2 920 6.0 220000 
24300 0 +4 200 0 
0% 4400 it & £9.00 
lo 04440 i To%o 4-4 42:0 
0 & fe 4 y-0: 6.4.4 4 
0000 3 4 0000 34 } 
2000 0 4 0 DOH 
rs & 2 0 0 co! 458 O60 
04440 0 040400 
Olo0 4 4 40 100 4 0 4 0 
000 % 4 } Oi ONh. + 6 4 
0000 3 % 00) 0; 0.350 


. Expand the frog Markov chain from 6 states to 20 states (all the middle 


columns, like the middle columns of the current 6-state transition ma- 
trix). Solve for the stable distribution. 


. Expand the frog Markov chain from 6 states to n states (all the middle 


columns, like the middle columns of the current 6-state transition ma- 
trix). Solve for the stable distribution. 


. Suppose that you have an 10-by-10 matrix that has a tridiagonal form 


except for one row. . 

(a) Explain why if this row is the last row, the speed of solving 
Ax = b with a tridiagonal matrix is barely affected. 

Explain why if this row is the first row, the speed of solving 
Ax = b can become proportional to n* operations. 


(b 


—" 


(a) Verify that finding the inverse of an n-by-n matrix will take about 
n° multiplications (or divisions) using elimination by pivoting. 
Hint: Because of special right sides, only n°/6 steps are used on 
right sides during the elimination procedure. 

(b) Verify that finding the inverse of an n-by-n matrix will take about 
n° multiplications (or divisions) using Gaussian elimination with 
back substitution. 


Show that for a band matrix with bandwidth w, Gaussian elimination 
takes approximately w’n multiplications (or divisions). 
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12. (a) Find A~! for the upper triangular matrix 


13. 


14. 


15. 


16. 


17. 


Go NV = WwW 
—_ & nN 


2 
| 
0 
0 


So 2 OG — 


using back substitution [or equivalently, pivoting on the augmented 
matrix [A I], except start with entry (4, 4), then entry (3, 3), 
(2, 2), (1, 1)J. 

(b) Generalize the computation in part (a) to show that the inverse of 
an n-by-n upper triangular matrix requires about n+ multiplications 
(or divisions) to compute. 

(c) Generalize the computation in part (a) to show that the inverse of 
an upper triangular matrix is upper triangular. 


Solve by regular Gaussian elimination the following systems of equa- 
tions with three significant digits (i.e., 2.002 becomes 2.00 and .9996 
is rounded to 1.00). 


(a) O0Olx-— y= 1 (b) .OOlx + 2y 
3x + y=0 —2ye=— ¥ 


| 
l 


> Ae) OR 2235 
3 4x + y 


Re-solve the problems in Exercise 13 using the stable elimination rules 
for rearranging rows and columns and scaling. Compare your solutions 
to the ones obtained in Exercise 13. 


Solve by regular Gaussian elimination the following systems of equa- 
tions with three significant digits. | 


(a) OOlx + y- z=I1 (b) x + OOly+z= 1 
x+2y+ 2=2 2x + OOly + z = 3 
—x— yr2z=3 —~eF Sy +2=0 


Re-solve the problems in Exercise 15 using the stable elimination rules 
for rearranging row and columns and scaling. Compare your solution 
to the one obtained in Exercise 15. 


Take the wrong answer x* you obtained for each problem Ax = b in 
Exercise 13 and compute the right-side error ¢ = Ax* — b (the right 
side obtained by Ax* minus the true right side). Solve Ax = € (using 
the stable elimination rules) and subtract the solution e* from x* to get 
a more accurate answer x** = x* — e*. Compare the answer x** with 
the answer you obtained from this problem in Exercise 14. 


For Exercises 18-27 involving the condition number of a matrix, always 
use the sum norm. 


18. What is the condition number of the n-by-n identity matrix? 
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19, 


20. 


21. 


22. 


23. 


Determine the condition number of the following matrices. Comment 
on whether or not small errors in data of each matrix A can result in 
large errors in the solution of Ax = b. 


, 3 fg Rw eS 
5b [Saf 252) She 


In the matrix A in Example 5, if k is changed from .8 to .85, how large 
a relative error in the solution to Ax = b is possible? 


oe mle Cole 
oe ae 
I DK Cle 


Sm ¢ 


(a) In the refinery problem in Example 6, if entry (1, 1) is changed 
from 20 to 10 (yielding matrix A’), how large a relative error in 
the solution to A’x = b is possible? 

(b) Solve the system A’x = b for the A’ in part (a) and compare the 
actual relative error with the relative error bound given in part (b). 


(a) Compute the condition number of 


i at 
A=|2 4 3 
1-1 2 


(b) Solve Ax = b, where b = [1, 2, 3]. 

(c) Suppose that we change entry (2, 1) from 2 to | to get a new A. 
How large a relative change in solution of Ax = b is possible with 
this change in A? [use the condition number estimate (36)]. 

(d) Solve Ax = b for this new A, and compare the observed relative 
change to the one predicted in part (c). 

(e) The large condition number of A in part (a) means that this matrix 
is close to being noninvertible, that ts, that some combination of 
two rows of A almost equals a third row—show that 4 of the sum 
of two of the rows almost equals the other row. 


Answer parts (a) and (b) using equation (37) in the text. 

(a) If b = [2, 1] is changed to [2, 2], how large a change can this 
yield in the solution to Ax = b for A in Exercise 19 part (a)? 

(b) If b = [1, 2, 3] is changed to [2, 2, 3], how large a change can 
this yield in the solution to Ax = b for A in Exercise 22? 

(c) Derive the bound |e|/|x + e| < c(A){le’|/|b|} in equation (37) by 
following the reasoning in equations (30) and (32) to obtain |e| =< 
|A~'l] + le’|, then divide by |x| (= |b]/|/Al)). 


. (This is a continuation of Exercise 17 of Section 3.2 and Exercise 15 


of Section 3.3.) The staff dietician at the California Institute of Trigo- 
nometry has to make up a meal with 600 calories, 20 grams of protein, 
and 200 milligrams of vitamin C. There are three food types to choose 
from: rubbery jello, dried fish sticks, and mystery meat. They have the 
following nutritional content per ounce. 
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Jello Fish Sticks Mystery Meat 


Calories 10 50 200 
Protein | 3 2 
Vitamin C 30 10 0 


If there is at most a 5% error in (sum) norm of any column in this 
matrix of data A, how large a relative error can occur in solving the 
dietician’s problem? 


(This is a continuation of Exercise 18 in Section 3.2 and Exercise 16 
in Section 3.3) A furniture manufacturer makes tables, chairs, and so- 
fas. In one month, the company has available 300 units of wood, 350 
units of labor, and 225 units of upholstery. The manufacturer wants a 
production schedule for the month that uses all of these resources. The 
different products require the following amounts of the resources. 


Table Chair Sofa 


Wood 4 | 3 
Labor 3 
Upholstery “f 0 4 


If the amount of wood needed to make a table was accidentally entered 
as 3 instead of 4, how large a relative error in the solution to this 
production problem is possible? 


(This 1s a continuation of Exercise 20 of Section 3.2 and Exercise 
17 of Section 3.3.) An investment analyst is trying to find out how 
much business a secretive TY manufacturer has. The company makes 
three brands of TV set: brand A, brand B, and brand C. The analyst 
learns that the manufacturer has ordered from suppliers 450,000 type | 
circuit boards, 300,000 type 2 circuit boards, and 350,000 type 3 circuit 
boards. Brand A uses 2 type-1 boards, 1 type-2 board, and 2 type-3 
boards. Brand B uses 3 type-1 boards, 2 type-2 boards, and 1 type-3 
board. Brand C uses | board of each type. 

If there were a mistake in getting the type | circuit board orders 
and the analyst thought 350,000 boards were ordered instead of 450,000 
boards, how large a relative error in the solution to this TV production 
problem is possible? 


Show that for any invertible matrix A with condition number c(A) (using 
the sum norm), c(A) = 1. 


Hint: Use the fact that ||AB|| = |{All - ||BIl. 


A Sampling of 


Linear Models 


Section 4.1 Linear Transformations in 


Computer Graphics 


In this chapter we discuss some linear models in greater detail. Three of 
these models were introduced in Chapter 1, Markov chains, linear program- 
ming, and population growth. Using matrix algebra and solution techniques 
learned in the previous chapters, we shall be able to analyze and solve these 
linear models. 

A general solution for most of the models in this chapter requires one 
to solve a system of linear equations; in matrix form, solve Ax = b. In 
solving some of these systems of linear equations, various theoretical diffi- 
culties will arise. Those problems will motivate the theory of solutions to 
systems of linear equations, which is discussed in Chapter 5. 

A common use of linear models is to predict the values of a set of 
variables in the future as a linear function of the variables’ current values. 
A Markov chain is such a model, as was the rabbit—-fox population model. 
Other models of this type are presented in this chapter. These models assume 
the matrix form 


w = Aw (1) 
or more generally (in future examples) 
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w = Aw+b : (2) 


where w is the vector of current values and w’ the vector of future values. 
In this section we give a geometric interpretation of (1) and (2). 

A linear transformation 7 of the plane maps each point w = (x, y) 
into a point w’ = 7(w), where 7(w) = Aw, for some 2-by-2 matrix A. A 
linear transformation in n dimensions is defined the same way (A is then an 
n-by-n matrix). When linear transformations are programmed on a computer, 
they can be used to move figures about and create the special visual effects 
we have come to associate with computer graphics. In this section the reader 
will learn how to build up complicated graphics effects out of simple trans- 
formations. 

It will sometimes be convenient in this section to drop matrix notation 


and write w’ = 7(w) = Aw to represent the pair of linear equations 
' = ax + by (3) 
y’ = cx + dy 


A slightly more general transformation, corresponding to w’ = 
Aw + b, is an affine linear transformation. 


~ 


'=ax+by+e (4) 
y’ cx + dy + f 


Clearly, all linear transformations are affine linear transformations (with 
e = f = 0). Shortly, we will see that affine linear transformations lack 
some very important properties that linear transformations have. 

Figure 4.1b, c, and d show the effect of transformations T,, 7, and 
T;, respectively, on the square in Figure 4.la whose corners are A = 
(0,0), B = (1, 0),C = (1, 1), D = (0, 1). 


Fr; x’ = 2x+4 
, : (3) 
y =a4y +2 
T.: x’ = cos 45°x — sin 45°y = .707x — .707y 6) 
y’ = sin 45°x + cos 45°y = .707x + .707y 
T.: x =x + 
nd * (7) 
ze =o 


Transformation 7, doubles the width and triples the height of the square and 
also moves it 4 units to the right and 2 units up. Transformation 7, has the 
effect of revolving the square 45° counterclockwise about the origin, but 
does not change the square’s size. Transformation 7, slants the y-axis and 
lines parallel to it by 45°. To help understand the effect of these transfor- 
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(d) 


Figure 4.1 (a) Unit square. (b) Square transformed ‘by T,. (c) Square transformed 
by 7,. (d) Square transformed by 7;. 


mations, readers should evaluate 7,, 7,, and 7; at point C (=[1, 1]) of the 
square in Figure 4.la. Exercise 31 shows that revolving a point about the 
origin always has the form (6). 

There is an important computational question to ask about transforming 
a square or any figure built out of line segments. Is it sufficient to compute 
just the new coordinates of the corners, and then connect these new corners 
with straight lines to obtain the full transformed figure? Fortunately, the 
answer is yes. This result is a simple consequence of two basic laws of 


matrix algebra. We state these laws in linear transformation form as a 
theorem. 
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Theorem I, Let T be a linear transformation with w’ = 7(w) and v’ = 
T(v). Then for any scalar constants r and s, 
(i) Tiw+v) = Tw) +7T(v)=w+y 
(ii) T(rw) = rT(w) = rw’ 
(iii) Tirw + sv) = r7T(w) + sT(v) = rw’ + sv’ 


When Theorem 1, parts (i) and (11) are rewritten in matrix form with 
T(w) = Aw, they become the familiar matrix laws A(w + v) = Aw + 
Av and A(rw) = r(Aw). Theorem |, part (ili) is just a combination of parts 
(i) and (ii). Theorem | is not true for affine linear transformations 7(w) = 
Aw + b—parts (ii) and (iii) fail (see Exercise 33.for counterexamples). 
Theorem 1, part (iti) generalizes to linear combinations of three or more 
points. 

If w and v are the two endpoints of a line segment L, any point t on 
L can be written as a linear combination of w and vy of the form 


t = rw + (1 — Pv, forsomer, OS=r=]1 (8) 


The constant r is the fraction of the distance t is from w to v. For example, 
if t were halfway between w and v, then r = .5. When w and v are mapped 
by some linear transformation T to points w’ and v’, the line segment be- 
tween them will be all points of the form 


t' = rw + (1 —- rv, forsomer, OsSrs=1 (9) 


By Theorem |, part (iii), we see that if t is as in (8), then t' [= 7(t)] is the 
expression in (9). So linear transformations map lines into lines. This result 
is also true for affine linear transformations [it is easily verified using matrix 
algebra (see Exercise 32)]. 


Theorem 2. An affine linear transformation T maps line segments into line 
segments. 


Theorem 2 allows us to compute transformations of straight-line figures 
simply by transforming corners of figures and then drawing lines between 
the transformed corners. 

In computer graphics applications, Theorem | says that if the coordi- 
nates of corners, or other critical points, in a figure can be expressed as 
linear combinations of the coordinates of some “‘key’’ points, then to trans- 
form the figure we only need to apply the linear transformation to the co- 
ordinates of these key points; the coordinates of other points can quickly be 
obtained from the coordinates of the transformed key points. 

Theorem | restated the basic fact of matrix algebra that vector addition 
and scalar multiplication are preserved by matrix-vector multiplication. 
Another almost-as-easy consequence from matrix algebra is the following 
result. 
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Theorem 3. If T, and T, are two affine linear transformations, the com- 
posite transformation w’ = T7,(7,(w)), obtained by mapping w to 


w = T,(w) and then mapping w’ to w’ = T-,(w’), is also an affine 
linear transformation. 


Proof. Let T,(w) be the mapping w’ = A,w + b, and 7,(w) be the 
mapping w’ = A.w + b,. Then using matrix algebra, 7,(7,(w)) can 
be written 


T(T\(w)) = A,(A,;w + b,) + b, (10) 
A,A,w + A,b, + b, 


Then 7,(7,(w)) is the affine linear transformation 7,(w) = A,w + bs, 
where 


A,=A,A, and ob, = A,b, + b, (11) # 


This theorem was easy to prove using matrix algebra. Without it, we 
would have to substitute one system of equations for affine linear transfor- 
mation 7, into another system of equations for affine linear transformation 
T,—a giant mess! 

Theorem 3 lets us build up complicated transformations out of simpler 
transformations that revolve, expand distances along the x- or.y-axis, move 
left (right) or up (down), slant axes, and other changes. Another use is in 
creating animated motion. If the rotation 7, [in equation (6)] used an angle 
of 1°, we would obtain a linear transformation 75 that would revolve, say, 
a square around the origin by 1°. If we repeatedly applied T, 360 times, we 
would obtain an ‘‘animated’’ sequence of figures that create one full revo- 
lution of the square around the origin. 


EET 
Example 1. Transforming a Set of Squares 


Draw a figure F consisting of a set of eight unit squares whose corners 
are at points (2g, h), g = 1,2,3 andh = —2, —1, 0, 1, 2 (see Figure 
4.2a). Observe that the coordinates of all corner points are linear com- 
binations of the coordinates of the lower-left corner (1, —2) and the 
‘“‘change in coordinates’’ points (1, 0) and (0, 1). For example, the 
point (2, 2) = (1, —2) + 1(1, 0) + 4(0, 1). Now let us transform 
F first by applying the linear transformation 7,(7(x, y)) (7; followed 
by 7,), and second by applying the linear transformation 7,(7>(x, y). 
The transformed figures are drawn in Figure 4.2b and c. For conven- 
ience, we restate 7, and 7, here: 
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Figure 4.2 (a) Grid. (b) Grid trans- 
formed by 7, followed by 7,. 

(c) Grid transformed by 7, followed 
by T;. 


T.: x = cos 45°x — sin 45°y = .707x — .707y 
sin 45°x + cos 45°y = .707x + .707y 
T;: x =>xt y 


ee (7) 
} a 


(6) 


“ 
| 


We successively apply 7, and 7; in both orders to the key points 
(1, —2), C1, 0), and (0, 1). 


T(73(1, —2)) = (.707, =— 2.121) T,(7,(1, —2)) = (1.414, — .707) 
T(T(1, 0)) =(.707,.707) —«*T,(T>(1, 0)) = (1.414, .707) 
T(T,(0, 1)) = (0, 1.414) T(T,(0, 1)) = (©, .707) 


We use Theorem 1, part (iii) to obtain the other corners as linear 
combinations of the transformed key points. For example, since the 
point (2, 2) = (1, —2) + 1(1, 0) + 4(0, 1), then 


T(T3(2, 2)) T,7T,(1, —72) + 17,73(1, 0) +t 47T,T (0, 1)) 
(.707, —2.121) + 1(.707, .707) + 4(0, 1.414) 


= (1.414, 4.242) 
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Note that changing the order of the transformations changes 
the composite transformation. Composing linear transformations is 
not commutative! This is because matrix multiplication is not com- 
mutative. a 


There are many geometric properties of figures that we might hope 
affine linear transformations would preserve: an angle at a corner, the area 
of a square, and the distance of each point from the origin. Each of the 
properties is satisfied by some but not all of 7,, T,, T3, defined above. With 
a little thought, the reader should be able to guess conditions on affine linear 
transformations which make each of these properties true. 

Another interesting property is reversibility—if u’ = 7(u), does there 
exist another transformation T~' such that u = T~'(u’)? That is, does T 
have an inverse? T~' should exist if the matrix used to define JT has an 
inverse; details are left as an exercise. 

Next let us consider transformations to represent three-dimensional 
figures in two dimensions. Any time one draws a three-dimensional figure 
on a piece of paper or displays it on a computer screen, one is performing 
such a transformation. A transformation (x’, y’) = T(x, y, z) that maps a 
three-dimensional figure into two dimensions is really operating in three 
dimensions, but the z-coordinate always becomes zero [i.e., (x’, y’, 0) = 
T(x, y, 2)). 

The simplest type of linear transformation from three to two dimen- 
sions is a projection onto the x-y plane (or onto the x-z or y-z planes). This 
has the form 7(x, y, z) = (x, y)—just delete the z-coordinate. A more general 
projection would project three dimensional space onto some plane that is 
not parallel to any pair of coordinate axes. Figure 4.3 illustrates such a 
projection of two dimensions onto one dimension. Here we have projected 
the x-y plane onto the (one-dimensional) line x = y using the transformation 


Figure 4.3 Projecting x-y plane 


onto line y = x. 


y 


» 
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¥ = 5x + Sy (12) 
x + Sy 


’ 


y 


The arrows in Figure 4.3 show how sample points are projected onto 
this line. When three-dimensional points are projected onto a plane, the 
situation is similar to Figure 4.3, but in three dimensions it is harder to 
illustrate with a figure. In Chapter 5 we will learn more about projection 
mappings. (We note as an aside that there is no way to reverse a projection, 
that is, there is no affine linear transformation that maps a line onto the 
whole plane, since by Theorem 2 lines are always mapped onto lines; sim- 
ilarly, a plane cannot be mapped linearly onto all of three-dimensional 
space.) 

The following two examples illustrate a standard way to map three 
dimensions into two, as well as a way to map two dimensions into three. 


Example 2. Projection of a Cube into the Plane 


Figure 4.4 (a) Three-dimensional unit cube. (b) Projection of unit cube into x-y 
plane. 


Devise a linear transformation that projects the three-dimensional unit 
cube shown in Figure 4.4a into the (x, y)-plane so that the cube looks 
just the way it is drawn on the (two-dimensional) page of this book in 
Figure 4.4a. In Figure 4.4a the z-axis is represented as a line at a 30° 
angle to the x-axis. Moreover, distances along the z-axis in Figure 4.4a 
are drawn with half the length of distances along the x- or y-axis. So 
the projection we want acts on a point (x, y, z) as follows: The 
z-coordinate should alter the x, y coordinates in the direction of a 30° 
angle above the x-axis and the distance of the displacement should be 
half the value of the z-coordinate. 
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T’: x’ = x + 4cos 30% 
'= y + $ sin 30° 


| 
| 


x + .433z (13) 
y+ we 


| 


= 
| 


See Figure 4.4b. ia 
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Exam e 3. Revolve a Letter Around the x-Axis 


Devise a set of linear transformations that take the letter L (for Linear) 
in Figure 4.5a and revolve it 5° around the x-axis, then 10°, then 15°, 
and so on, around the x-axis, to make an animated movie of the L 
revolving around the x-axis. The revolution around the x-axis takes 
place in three dimensions. Let us agree to represent the transformed L 
in two dimensions using the projection 7” given by (13). That is, we 
treat the L as a figure in three dimensions, then successively revolve 
it 5° around the x-axis and display the results at each stage in two 
dimensions using 7”. 


Figure 4.5 (a) Letter L. (b) Let- 
ter L revolved 50° about x-axis 
and projected back onto x-y 
plane. (c) Letter L revolved 50° 
about x-axis, then shrunk by 7}, 
and projected onto x-y plane. 

(d) Letter L revolved 150° about 
x-axis, then shrunk by 74, and 
projected onto x-y plane. 
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The corners of L are (1, 3), (1, 0), (2, 0), which in three di- 
mensions become (1, 3, 0), (1, 0, 0), (2, 0, 0). Revolving L around 
the x-axis is similar to revolving a figure in the (x, y)-plane around the 
origin; this is what 7, did, at the beginning of this section. We keep 
the x-coordinate fixed and revolve the y- and z-coordinates the way T, 
did. Thus the required linear transformation T in three dimensions is 


T: x’ =x 
y’ = cos 5°y — sin 5°z (14) 
z’ = sin S°y + cos 5°z 


First we apply 7 to the corners of L, (1, 3, 0), (1, 0, 0), 
(2, 0, 0), and then apply 7 again to the transformed corners and con- 
tinue applying 7. Each time we apply 7, we display L in two dimen- 
sions by using the projection 7’. The original L and the result after 
applying T 10 times, a 50° revolution (and then applying 7’), are shown 
in Figure 4.5a and b. Note that 7 leaves corners (1, 0, 0) and 
(2, 0, 0) unchanged. Ai 


The result w* of applying 7 10 times to an initial 3-vector u could 
also be computed by multiplying u by the tenth power of the matrix of 
coefficients in (14). Of course, the smart way to obtain a 50° rotation is just 
to substitute cos 50° and sin 50° in (14). 

There are several variations on the transformation in Example 3 that 
are used in computer graphics. In practice, we would probably want to 
transform a whole set of letters that spell out a word. We might shrink or 
magnify the letters by an amount r as they revolve; we use the transformation 
(x", y", 2") = (rx", ry’, rz') composed with T.. We might want the letters 
to recede back into the distance, away from the (x, y)-plane. This can be 
accomplished by increasing the z-coordinate a little more each time and 
shrinking the letters a little (in all coordinates). The following affine linear 
transformation could be used after k applications of T. 


Te (x, y", 2”) = (.95*x', .95*y’, .95*z’ + .1k) (15) 


Note that 7, has the undesirable affect of shrinking the x-coordinate toward 
the left (toward the origin). Figure 4.5c shows the result of applying T 10 
times followed by 7,,; Figure 4.5d shows the result of applying 7 30 times 
followed by 73, (again 7” is used to get planar depictions). 

Let us add a warning about roundoff errors. A small error in computing 
(14) (to revolve the letter L) may become a noticeable error after dozens of 
iterations of (14). One easy way to eliminate this type of error is on every 
tenth iteration to compute the new coordinates of the transformed L directly 
from the original coordinates, (1, 3), (1, 0), (2, 0), by performing a 50° 
rotation [as noted above, do this by using cos 50° and sin 50° in (14)]. This 
method of eliminating an accumulation of errors by “‘updating’’ is used 
frequently in many different types of linear models. 
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Hopefully, Examples 1, 2, and 3 have given the reader a sense of how 
to generate a variety of transformations to move figures around in two and 
three dimensions. We have tried to provide the reader with the basic tools. 
We leave the creation of more interesting graphics transformations as 
projects for the readers (some simple graphics projects are suggested in the 
Exercises). Please note how extremely tedious animated graphics would be 
without computer programs to map sets of points repeatedly, as in Example 
3, and to build a complex transformation from a sequence of simple trans- 
formations, as permitted by Theorem 3. 

There is one very important problem in computer graphics that we 
have not discussed—the hidden surface problem. In Figure 4.4 we show all 
the corners and edges of the cube, but actually some are not visible because 
they lie at the back of the cube. The problem of determining which corners, 
edges, and surfaces of an object are visible is tricky and its solution relies 
heavily on linear algebra; even determinants are involved. 

In earlier chapters we saw how eigenvectors can simplify the compu- 
tation in iterating the system x‘ = Ax. We now give a geometric interpre- 
tation of how eigenvectors simplify a linear transformation 7(w) = Aw. 
Recall that an eigenvector u of a matrix A has the property that multiplying 
u by A has the effect of multiplying u by a scalar. That is, there is some 
scalar A, called an eigenvalue of A, such that Au = Au. Since matrix A 
and the linear transformation 7(w) = Aw are really “‘the same thing,”’ 
we will speak interchangeably about a vector u being an eigenvector of A 
or of T. 

Eigenvectors allow us to break a linear transformation into simple 
parts. (This reverses our previous goal of building up complicated transfor- 
mations from simple ones.) The idea is to change to a coordinate system 
based on the eigenvectors of A, as was done in Example 8 of Section 2.5, 
and apply T in terms of this new coordinate system (in Section 3.3 we 


showed that converting to eigenvector coordinates was equivalent to writing 
A in the form UD, U~'). 


Example 4. Eigenvector Coordinates to Simplify a 
Linear Transformation 


Consider the linear transformation 7, 


Xe 
y =tx+3 


y (16) 
y 


The standard (x, y) coordinate system expresses a point as a linear 
combination of the point (1, 0)—the distance along the x-axis—and 
the point (0, 1)—the distance along the y-axis. That is, (x, y) = 
x(1, 0) + y(O, 1). Now let us use a coordinate system in which points 
are expressed as a linear combination of two eigenvectors of T (these 
are magically provided by the author). They are u, = (1, J) and 
u, = (—2, 1). Applying 7 to u, and u,, we have ? 
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Ri) S22) = 20) 


We want to give an arbitrary point (x, y) new coordinates (r, s) such 
that 


(x, ¥) = 7, 1) + §—2Z,.1) (17) 


Since T multiplies vector (1, 1) by 2, then by Theorem 1, 7 also 
multiplies vector r(1, 1) (for any r) by 2; similarly for multiples of 
(—2, 1). In a coordinate system (r, s) based on (1, 1) and (—2, 1), 
T becomes 


r’ = 2r (18) 


gs = 3s 


It will require some work to convert a point in standard (x, y) 
coordinates [based on points (1, 0) and (0, 1)] to this new coordinate 
system based on (1, 1) and (—2, 1). But if T is to be applied repeat- 
edly, the simple form (18) of T in the new coordinates is worth the 
effort of conversion. Writing (17) as a system of equations for the 


coordinates, we have 
] —2 
Ph +9 (19) 
y 1 | 
1 -—2 
w = Et, where E = t = ‘ 
] | s 


Here E is the matrix whose columns are the eigenvectors. 


or 


Solving (19) for r and s, we obtain 


t = E~'w: r 


§ 


x + By (20) 
—3x + $y 


Let us consider the effect of repeatedly applying J to the point 
(1, 0) [given in (x, y) coordinates]. Using (20), we convert (1. 0) to 
(3, —3%) in (r, s) coordinates. Now we repeatedly apply T to (3, —4) 
using (18). We get the sequence of points [in (r, s) coordinates] 


Each application of T doubles the first coordinaté and halves the second 
coordinate (see Figure 4.6). So after k applications of T we get 


(T,, Sx) ey 3(2*, — (3)*) 
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Figure 4.6 Repeated application of T in Example 4 to point p = (0, 1). Points 
T*(p) using s, t coordinates are shown. 


If we want to express the coordinates of this point (r,, s,) back in the 
original (x, y) coordinates, we simply convert back with (19) to obtain 


ae = se 2S; = 42" eS (3) 


(21) 
Ye = 1, + 5, = 32" — 3) 


For large k, 2* is much much larger than 3(3)*, so we can neglect the 
latter term (the effects of computer roundoff will eventually drop the 
smaller term for us). 

In the long term, the largest eigenvalue always dominates the 
effects of all other eigenvalues (as was discussed in Section 2.5). In 
this case we have 


(X;,, y;) = a(2*, 2*) (22) 


This simple result would have been much harder to obtain without 
the change of coordinates. Clearly, using eigenvector-based coordi- 
nates can make calculations with linear transformations much easier. 

w 
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Note: The interested reader should consult one of the computer graph- 
ics references at the end of the book for more information about this fase 
Nating application of linear algebra. 


Section 4.1 Exercises 


Summary of Exercises 

Exercises 1—24 call for the construction of various affine linear transfor- 
mations and plotting their effect on certain figures. Exercises 25-27 have 
eigenvector-coordinate computations. Exercises 28-33 involve associated 
theory. 


1. Construct affine linear transformations to do the following to the square 

in Figure 4. la. 

(a) Rotate the square 180° counterclockwise around the origin (in the 
plane). 

(b) Move the square 7 units to the right, 3 units up, and double its 
width. 

(c) Make the vertical lines of the square slant at a 45° angle (height 
unchanged): 


(d) Reflect the square about the y-axis. 


2. Write out the affine linear transformations in Exercise | in the form 
= Au + b, giving A and b. 


3. Compute the new coordinates of the corners of the square in Figure 
4.1a for each of the transformations in Exercise 1. 


4. If 7 is the triangle with corners at (—1, 1), (1, 1), (1, —1), draw T 
after it is transformed by each of the transformations in Exercise |. 


5. Apply each of the transformations of Exercise | to the grid in Figure 
4.2a and plot your answer. 


6. The following exercise verifies which types of affine transformations 
are commutative. All transformations act on the x-y plane. Let 


T,, double the x-coordinate: x’ = 2x, y" = y. 
T,, double the y-coordinate: x’ = x, y’ = ty, 
T.. reflect about the y-axis: x. = —x,y' = y 
T,, reflect about the x-axis: x" = x, y’ = —y. 
T,, shift x-value 2 units: x. = x + 2, y’ = 4 
T, shift y-value 3 units: x" = x, y’ = y + 3 
T,, rotate 45° around the origin [= T, in (6)]. 
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10. 


ll. 


12. 


13. 


14, 


15. 


(a) Compute 7,7. and T.T,,. Do these two transformations commute? 
(b) Compute 7,7. and T,T,. Do these two transformations commute? 
(c) Compute 7,7; and T,T,,. Do these two transformations commute’? 
(d) Compute 7,7, and 7,7,. Do these two transformations commute? 
(e) Compute 7,7, and TT. Do these two transformations commute? 
(f) Compute 7.7; and T,T,. Do these two transformations commute’? 


(g) Compute 7,7, and T,T,. Do these two transformations commute? 


Construct affine linear transformations to do the following to the square 

in Figure 4.1a and plot the square after the transformation is performed. 

(a) Double the width of the square (double x coordinates) and rotate it 
90° around the origin. 

(b) Reflect the square about the y-axis and then reflect about the 
X-axis, 

(c) Move the square 7 units to the right, 3 units up, and then rotate 
the square 180° counterclockwise around the origin. 


Reflect the square in Figure 4.la about the line y = x. Give your 
transformation. 


Reflect the square in Figure 4.la about the line y 
transformation. 


2. Give your 


Rotate the square in Figure 4.la 90° about the point (3, 0) by first 
moving the square so that its center is at the origin, then rotate it 90°, 
and finally reverse the initial move. Give your transformation. 


Rotate the grid in Figure 4.2a 90° counterclockwise about the point 
(G; I). 


Hint: See Exercise 10. 
Give your transformation. 


Rotate the square in Figure 4. la 45° about its center. 


Hint: See Exercise 10. 
Give your transformation. 


By squaring the associated matrix, determine the transformation of re- 
peating twice the transformations in Exercise 7, parts (a) and (b). Plot 
the square in Figure 4.la after applying each squared transformation. 
Finally, describe in words the effect of the squared transformations. 


Cube the matrix associated with the linear transformation in Exercise 
|, part (a) and verify that the resulting linear transformation is the same 
as the original one (rotating 180° three times is the same as rotating 
180° once). 


Square the matrix associated with the projection linear transformation 


T’ as (13) and verify that the square equals the original matrix. Explain 
this result in words. 
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17. 


18. 


19. 


20. 


21. 


22. 
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Devise an affine linear transformation to project any (x, y) point onto 
the following lines. 

(a) y= =z (b) y = 2x 

(c) y=xr+2 (d) y= 3x — 8 


(a) Verify that x’ = 2x — y, y’ = 2x — y projects any x-y point onto 
the line y’ = x’ and that the projection is not perpendicular like 
(12), but rather, the line segment from (x, y) to (2x — y, 2x — y) 
has a slope of 2. 

(b) Construct a projection 7 that maps any x-y point w onto the line y 
= 2x so that the line segment from w to 7(w) has a slope of |. 


Give the x-y coordinates of the corners of the following x-y-z figures 

after they are projected onto the x-y plane using projection 7” in (13) 

and draw the figures (in the x-y plane). 

(a) A triangle with corners (1, 1, 0), (1, 0, 1), (Q, 1, 1). 

(b) A pyramid with base (0, 0, 0), (2, 0, 0), (0, 0, 2), (2, 0, 2) and 
top at (1, 2, 1). 


Construct a linear transformation to do the following to the letter L in 
Figure 4.5a. 

(a) Revolve it 30° around the y-axis. 

(b) Revolve it 10° around the z-axis. 

(c) Revolve it 30° around the y-axis and then 30° around the z-axis. 


Give the composite linear transformation of performing the revolution 
T in Example 3 followed by the projection 7” in (13). 


Revolve the grid in Figure 4.2a 30° around the x-axis and project onto 
the x-y plane [using 7” in (13)] (plot this). 


Revolve the square in Figure 4.la 60° around the x-axis (in three di- 
mensions), then shrink all coordinates to half-size and project it onto the 
x-y plane using 7” in (13). 


. (a) Construct a linear transformation to revolve an object 30° around 


the line of points (x, 1, 0) (the line is parallel to the x-axis with 
y = 1,z = O). 
Hint: See Exercise 8. 

(b) Apply the linear transformation in part (a) to the grid in Figure 
4.2a and project the result onto the x-y plane with projection 7” in 
(13). 


. Construct a linear transformation to make an animated movie in which 


in each successive frame the object 

(a) Revolves 30° around the y-axis and shrinks its x- and y-coordinates 
by 10%. 

(b) Revolves 10° around the y-axis, shrinks all coordinates by 10%, 
and then moves 2 units along the z-axis. 
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25. 


26. 


27. 


29. 


30. 


‘eke 


Repeat Example 4 to find the approximate point after k transformations 
using T in (16) if the initial point is 
@G,2) ©) © (-2,-) 


Find the eigenvalues and associated eigenvectors of the following linear 
transformations (use the method in Section 3.1). 

(a) x’ = 3x + y, y’ = 2x + 2y (b) 7” in (13) 

(c) T, in (6) 

Hint: Eigenvalues are complex. 

(d) 7, in (7) (e) 7, in (5) 


Use your results in Exercise 26, part (a) to find the result of applying 
that transformation 5 times to the point [5, 2] and the approximate value 
of applying it 20 times. 


. This exercise gives a “‘picture’’ of how, when two columns of A are 


almost the same, the inverse of A almost does not exist. For the fol- 


| 
lowing matrices A, solve the system A a = i Then plot 
x4 


x,a¢ and x,a$ in a two-dimensional coordinate system and show geo- 
c ee & ac 
metrically how the sum of vectors x,ay; and x,a5 is 0 (here ay, a5 


denote the two columns of A). 


£3 z 3 » 6 $s. 9 
(a) k 4 (b) |; " (c) j 4 (d) ‘ | 


A linear transformation w’ = 7(w) = Aw can be reversed if A is 
invertible. Then the reverse transformation is 7*(w) = A 'w. Find the 
reverse transformation, if possible, for _. 

(a) 7; in (7) (b) 7, in (6) (c) 7" in (13) 


An affine linear transformation w' = 7(w) = Aw + bcan be reversed 

if A is invertible. 

(a) In matrix notation, what is the inverse transformation 7*(w) for 
T(u) = Aw + b (so that 7*T is the identity transformation)? 

(b) Find the inverse transformation for 7, in (5). 


Consider the point u = (1, 0) and suppose that we want to rotate it 6° 
counterclockwise around the origin. Then its new position will be dis- 
tance | from the origin and at an angle of 6° (with respect to the 
x-axis). Using a similar argument for the point v = (0, 1), show that 
for u’ = Au and v’ = Av to have the right values, A must be 


8° —sin 0° 
re $3 sin 
sin 6° cos 6 
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32. Verify that affine linear transformations map lines into lines by showing 
ifu’ = T(u) = Au + D, points of the form w = ru + (1 — Pv’ are 
mapped into points of the form w’ = ru’ + (1 — r)v’, where u’ = 
T(u) and v’ = 7(v), ndO=resl. 


33. Make up counterexamples to show that Theorem 1, parts (1) and (ii), 


are false for affine linear transformations (virtually any affine example 
will do). 


Section 4.2) Linear Regression 


One of the fundamental problems in building linear models is estimating 
coefficients and other constants in the linear equations. For example, in the 
refinery model introduced in Section 1.2 we stated that from 1 barrel of 
crude oil the first refinery would produce 20 gallons of heating oil, 10 gallons 
of diesel oil, and 5 gallons of gasoline. These production levels would vary 
from one batch of crude oil to another and might also vary depending on 
how much crude oil was being processed each day. The numbers given are 
estimates, not precise values. The first important work in linear algebra grew 
out of an estimation problem. 

In 1818 the famous mathematician Karl Friedrich Gauss was commis- 
sioned to make a geodetic survey of the kingdoms of Denmark and Hanover 
(geodetic surveys create very accurate maps of a portion of the earth’s spher- 
ical surface). In making estimates for the positions of different locations on 
a map, Gauss developed the least-squares theory of regression that we 
present in this section. This theory yields a system of linear equations that 
must be solved. To solve them, Gauss invented the algorithm we now call 
Gaussian elimination, presented in Section 3.2. This method is still the best 
way known to solve systems of linear equations. It should be noted that not 
only did this survey project cause Gauss to start the theories of statistics and 
linear algebra, but to compensate for the slightly nonspherical change of the 
earth, Gauss was also led to develop the theory of differential geometry! 
The following equation summarizes this paragraph. 


one good application + one genius = important new mathematics (*) 


Let us return humbly to the problem of estimating coefficients. Recall 
the linear model from Example 2 of Section 1.4 for predicting C, the college 
grade average of a Scrooge High School graduate, in terms of the student’s 
Scrooge High average S. The proposed model was 


C=. 1.1X%85 — 9 (1) 


The constants in (1) were chosen to “‘fit’’ as closely as possible data about 
eight graduates. The heart of these models is the choice of the constants. 

Let us restate the problem of finding a linear model such as (1) in the 
following standardized form: 
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Linear Regression Model. Given a set of points (x,, y,), (5, y5), 
... 5 (x,, y,), find constants g and r such that the linear relation 


y= 4ay+7r (2) 


gives the best possible fit for these points. 


The point (x;, ¥;) is the estimate for (x,, y,). The name ‘‘regression,"’ 
which means movement back to a less developed state, comes from the idea 
that our model recaptures a simple relationship between the x; and the y, 
which randomness has obscured (the variables regress to a linear relation- 
ship). A model involving several input variables is called multiple linear 
regression. If we try to fit the data to a more complex function, such as 
Y = gox? + gx + rory = e®, the model is called nonlinear regression. 
We shall concentrate first on simple linear regression. Once this is well 
understood, we can extend our analysis to the multiple regression problem 
(using matrix algebra). We shall also show how some problems in nonlinear 
regression can be transformed into linear regression problems. 


Le 
Example 1. Using the Model y = gx 


Let us consider a very simple regression problem. Suppose that we 


want to fit the three points (0, 1), (2, 1), and (4, 4) to a line of the 
form 


y = qx (3) 


(see Figure 4.7). The x-value might represent the number of semesters 
of college mathematics a student has taken and the y-value the student’s 
score on some test. There are thousands of other settings that might 
give rise to these values. The estimate (3) would help us predict the 


Figure 4.7 Regression estimates for points y 
(O, 1), (2, 1), and (4, 4). 
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y-values for other x-values, for example, predict how other students 
might do on the test based on the amount of mathematics they have 
taken. 

The three points in this problem are readily seen not to lie on a 
common line, much less a line through the origin [any line of the form 
(3) passes through the origin]. So we have to find a choice of gq that 
gives the best possible fit, that is, a line y = gx passing as close to 
these three points as possible. 

What do “‘best possible fit’’ and ‘‘as close as possible’’ mean? 
The most common approach used in such problems is to minimize the 
sum of the squares of the errors. The error at a point (x,, y,) will be 
ly, — y| = |gx; — y,|, the absolute difference between the value qx; . 
predicted by (3) and the true value y; (the absolute value is needed so 
that a ‘‘negative’’ error and a “‘positive’’ error cannot offset each 
other). However, absolute values are not easy to use in mathematical 
equations. Taking the squares of differences yields positive numbers 
without using absolute values. There is also a geometric reason we 
shall give for using squares. 

For the points (0, 1), (2, 1), (4, 4), the expression & (, — y,)? 
for the sum of squares of the errors (SSE) is 


SSE = (0g — 1)? + (2g — 1)? + (4q - 4) 
l + (4q? — 4q + 1) + (16q? — 32g + 16) = (4) 
= 20q* — 36q + 18 


The geometric justification for using a sum of squares is based 
on the following interpretation of our estimation problem. Let x be the 
vector of our x-values and y be the vector of our corresponding 
y-values. In this case, x = [0, 2, 4] and y = [1], 1, 4]. Further, let y 
be the vector of estimates for y. Equation (3) can now be rewritten 


y = qx (5) 


That is, the estimates ¥ = [¥,, ¥2, ¥3] from (3) will be g times the 
x-values [0, 2, 4]. 

Think of x, y, and ¥ as points in three-dimensional space, where 
y is a multiple of x. Then the obvious strategy is to pick the value of 
gq that makes gx (= y) as close as possible to y (see Figure 4.8). That 
is, we want to minimize the distance |gx — y|. (in the euclidean norm) 
in three-dimensional space between gx and y. This distance between 
qx = [0q, 2q, 4q] and y = [1, 1, 4] is simply 


lax — yle = V(Og — 1) + Qqg- 1% + (4g- 4" 6) 
Comparing (4) and (6), we see that |gx — y), is the square root of 
SSE. So minimizing SSE will also minimize the distance |gx — y\.. 


Recall that in vector notation |a|? equals a+ a. So 


SSE = |x — yl? = (gx — y)- (qx — y) (7) 
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The value of qg that minimizes 20g* — 36g + 18 is found by 
differentiating this expression with respect to g and setting this deriv- 
ative equal to 0; 


dSSE 
Se ae oe ee (8) 


The desired regression equation is thus y = .9x (see Figure 4.7). Using 
our regression equation y, = .9x;, our estimate for (0, 1) is (0, 0), for 
(2, 1) is (2, 1.8), and for (4, 4) is (4, 3.6) [the bad estimate for 
(0, 1) arose from the fact that any line y = gx must go through the 
origin}. So SSE = 1? + .8? + .4% = 1.80, and the distance in 
3-space between our estimate vector y and the true y is |y — y| = 


VSSE = V1.80 = 1.34. Al 


Readers should pause a moment to get their geometric bearings. Ex- 
ample | started as a problem of estimating a relationship between some 
x- and y-values that we plotted in Figure 4.7 in x-y space. But then we 
considered a new geometric picture with three-dimensional vectors, formed 
by the x-values, the y-values, and the y estimates. Let us present the data 
in a matrix: 


x-value y-value 
First reading 0 I 
Second reading 2 l 
Third reading 4 4 
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In this new setting, our objective is to find a multiple g of the first 
column x that would estimate the second column y as closely as possible. 
Geometrically, we want to find a point p on the line formed by multi- 
plies x (= [0, 2, 4]}—p = qx, for some g—such that p is as close to 
y (= [1, 1, 4) as possible. Point p (= y) is the projection of y onto the 
line from the origin through x. 


Example 2. Using the Model y = gx + r 


Let us fit the points in Example 1, (0, 1), (2, 1), (4, 4), using the full 
linear regression mode) (2): y = gx + r. We can also write our 
regression model as 


l 0 | 
y=qxtri: Lh) = oa era (9) 
4 4 


We want to find an estimate vector y as close to the vector y of 
y-values as possible in 3-space. Now y is formed from a linear com- 
bination of the vectors x and 1. (The set of all possible linear combi- 
nations of x and 1 will be a plane.) 


Again we pick g and r by minimizing the sum of squares of 
errors, 


SSE = |j — y/? = 2 (9; — y,)? = 2 (qx, + r — y,)? 
= (0g +r— 1)? + (2g +r -— 1) + 4q +r —- 4P 
(r? — 2r + 1) + (4¢’ +r? + 4qr — 4g — Or + 1) 
+ (16g? + r* + 8gr — 32q — 8r + 16) 
20g? + 3r? + 12gr — 36g — 12r + 18 Le ih 


To minimize (10) with respect to g and r, we differentiate with respect 
to g and r and set the partial derivatives equal to 0. 

It is left as an exercise for the reader to verify that the partial 
derivatives of SSE in (10) with respect to g and r are 


dSSE 


40g + 12r — 36 = 0 or 40g + 12r 
0g (11) 


dSSE 
” Or! IZ = FZ or 12g + 6r 


I 
Lo 
oN 


| 

| 
—_) 
| 
~ 


Solving the pair of equations in (11) for g and r, we obtain 
g = .75, r=.5 (12) 
So our regression equation is 


§ = .75x + .5 (13) 
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(see Figure 4.7). This time our estimate for (0, 1) is (O, .5), for 
(2, 1) is (2, 2), and for (4, 4) is (4, 3.5), and SSE = .5? + 1? + 
5? = 1.50. The distance |¥ — y| between our estimate vector y and 
the true y-value vector y is VSSE = V1.50 = 1.22. In Example | 
the distance was 1.34. Thus, for these data, our fuller model provided 
little improvement over the simple model ¥ = qx. a 


If we were applying linear regression models (2) or (3) to n points, 
the x- and y-values would form n-vectors and the distance |¥ — y| would be 
calculated in n dimensions (the reasoning is the same). These calculations 
would be quite tedious. However, nowadays we have computers to handle 
the tedium. It is as easy to program a computer to do regression on n points 
as it is on three points with the following observation. 


Proposition. The derivative of a sum of functions is the sum of the deriv- 
atives of each function. 


This proposition greatly simplifies taking derivatives to find the min- 
imum of SSE. In the model y = gx, SSE has the following form for the set 
of points (x;, y;), i= 1,2,...,m: 

SSE = & (qx; — yi)? = 2 (q’x7 — 2qxy; + Yi) (14) 


By the proposition, the derivative of SSE is the sum of the derivatives of 
the individual terms in (14): 


dSSE 


dq = 2 (2x7q — 2x.y;) = 2(2 x7)q sik » X,Y; (15) 


Setting (15) equal to 0 (to find the minimizing value of g), we obtain 


Formula for Regression Model ¥ = qx 


_ 2 xy; < 
teem 


Let us informally rederive (16) using matrix algebra. Recall that 
SSE = (gx — y)* (gx — y). In vector calculus, we treat (gx — y) ° 
(qx — y) like (gx — y)’, and SSE’s derivative is 

LEE = AB) = La8 SS 
If we set this expression equal to zero, we get 
2qx°xX — 2x*y = 0 or qx*x => xy 


Hence we have directly g = x-* y/x: x. 
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The simple form of the formula (16) for g was not apparent in the 
calculations we did in Example |. If we were working with 10 or more 
points, it would clearly be much easier to do the general calculations just 
performed to obtain (16) and then plug in the given x- and y-values, rather 
than to multiply out all the squared factors in SSE and collect terms, as was 
done in equation (4) in Example 1. General symbolic computations can make 
life a lot easier! This is what mathematics is all about. The details of an 
individual problem often hide a nice general structure for solutions. In pro- 
gramming terms, it is often easier to write a computer program to solve a 
general class of problems and use it to solve one specific problem, rather 
than write a specialized program for the single problem. 

The calculations for the regression model y,; = gx, + r are obtained 
similarly by generalizing the equations in Example 2. The partial derivatives 
of SSE can be shown to have the form (see Exercise 10) 


2 2% x7)q + ALx)r — 2% x,y; 
oq (17) 


SSE 
: a 22 x,)q¢ + 2nr — 2D; 


We set the derivatives equal to 0 and solve this pair of equations for g and 
r. That is, we solve the equations 


aq + br =e (18) 
cq + dr =f 


where 


23%. b=22x, e= 22 xy; 
je Ne d = 2n, f=22y, 


The solution is 


Formula for Regression Model ¥ = qx + r 


eS n2x? — (2 xy 
oti (> y,Sx7) — (x) x,9;) 
nzx — EY 


Again we note the advantage of generality. Solving a pair of linear 
equations in terms of constants a, b, and so on, and then substituting complex 
expressions for the constants of (18), is much easier than directly solving 
the specific system of equations arising from (17). 
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Although the formulas in (19) are certainly complicated, they are still 
quite simple to program. In fact, because linear regression is so widely used, 
many (nonprogrammable) hand-held calculators have built-in routines to cal- 
culate g and r. One simply enters successive (x;, y;) pairs. When a pair is 
entered, the calculator updates sums that it is computing for > x,, = x?, 
> y;, and =} x,y,;. After the last pair is entered, the user presses a Regression 
key and the calculator inserts these sums into the formulas in (19). The 
following BASIC program shows how the calculator works. 


1 N=0: SX=0: SX2=0: SY=0: SXY=0 
10 INPUT X,Y 
20N=N+1 
30 SX = SX + X 
40 SX2 = SX2 + X*X 
SOSY =SY+/Y 
60 SXY = SXY + X*Y 
70 IF Regression key not pushed THEN GOTO 10 
100 D = N*SX2 — SX*SX 
120 PRINT ‘“‘R =”; (N*SXY—SX*SY)/D 
130 PRINT ‘‘Q = ”’; (SY*SX2—SX*SXY)/D 
140 END 


A word of warning about the formulas in (19). Roundoff errors in 
computing the terms in the denominators of these formulas can sometimes 
seriously affect the accuracy of the results. In line 100 of the BASIC pro- 
gram, if N*SX2 and SX*SX were large numbers that were nearly equal, 
then their difference D may be very inaccurate. Also, one data point quite 
different from the rest (caused by an unusual event or a recording error) can 
significantly affect the values of g and r; such points are called outliers. 
Exercise 5 illustrates the effect of outliers. 

We now show a convenient shortcut for the linear regression model 
y = gx + r. With a computer or calculator programmed to do regression, 
this shortcut does not save time, but it does eliminate roundoff-error diffi- 
culties. : 

We shall perform an elementary transformation of x-coordinates. 


| 
x, — i, where X = — Xx (20) 


The term xX we use to shift the x-coordinate is just the average of the x,;. Note 
that if the x-coordinates are integers (as is common) and are equally spaced 
along the x-axis, their average will be the middle integer, if m is odd (or 
midway between the middle two integers, if m is even). In the case in 
Example 2, the average of x; is (0 + 2 + 4)/3 = 2. Since the y-values are 
unchanged, we are simply renumbering the x-axis (see Figure 4.9). 
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Figure 4.9 Figure 4.7 transformed so 
that x’ = O is the mean of x’-values. 
The old regression line y = 0.75x + 
0.5 becomes y = 0.75x' + 2, where the 
constant 2 is the mean of the y-values. 
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In the new coordinate system, the average of the x; will be O0—that 
was the whole idea of the shift, to center the x-values around the origin. If 
the average of the x; is 0, so is the sum of the x; (since the average is the 
sum divided by n). Now in (19) all products involving % x; are 0. The 
formulas for r and g in (19) simplify considerably, to become 


' pa aa 
fs ae. 
x; (21) 
a’ i 
pre i 


Here r’ is just the average of the y,, and g’ is the same formula that we 
obtained in (16) for the regression model y = gx. Roundoff error can no 
longer distort the denominator in (21) as was possible in (19). 

Shifting the x-coordinate will not change the slope of a line, so g’ 
equals g in the original model y = gx + r. The reader can verify with a 
geometric argument that r = r’ — q & x,/n. 

If we had also transformed the y-values by their average, then r’ = 0. 
However, the regression formula for g’ does not simplify further if we 
transform the y-values, so a y-transformation serves no purpose. Also, trans- 
forming just the y-values instead of the x-values will not simplify the de- 
nominator in (19) as happened in (21). 


Example -, Predicting Printing Costs 


A copy center bases its fees on the number of (duplicate) units that 
have been ordered (a unit is 100 pages). Table 4.1 gives some sample 
fees. Based on these sample fees, what would be a reasonable charge 
for 15 units? 

Let us fit a line y = gx + r to these five data points, (1, 6), 
(3, 5.5), (5, 5), (10, 3.5), (12, 3), and then determine y when x = 
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Table 4.1 
Ra ass Senet ee 
Number of Units Cost per Unit 


I $6 
3 5.5 
5 5 
10 Se 
12 3 


15. Using the shortcut described above, we transform the coordinates 
by subtracting 6.2 (= & x,/n) from each x;. Then (21) gives 


os > xi; 
"Exe 
_ (—$.2)*6 + (—3.2)X5.5 + (—1.2)x5 +3.8X3.5 + 5.8x3 
“ (—5.2)? + (—3.2)? + (—1.2) + (3.8)? + (5.8) 
=—3 


(22) 
and r’ = x y,/5 = 4.6. Then 


gq=q =-.3 and r=r - i238 = 4.6 — (—.3)(6.2) = 6.5 
Thus our regression model is y = —.3x + 6.5. And when x = 15, 
we obtain y = —.3X15 + 6.5 = 2. ie 


Next we give an example of a nonlinear regression problem and show 
how it can be converted into simple linear regression. 


PLT 
Example 4. Nonlinear Regression 


Consider the following pairs of x- and y-values; the points are shown 
in Figure 4.10a. The x-values could be the age of a wine and the 
y-values ratings by expert wine tasters. 


Suppose that by inspection and experience we believe that the 
relationship between x- and y-values is best explained by an exponen- 
tial model 


y = re®™ (23) 
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Figure 4.10 (a) Data points and curve y = 4e*’?. (b) Data transformation by 
y = logy. 


Then we perform the following (nonlinear) transformation on the 
y-values. 


y = logy . (24) 


The new data values are 


Clearly, we have a fairly good linear fit here (see Figure 4.10b). 
(One can also plot the original data points on log paper.) We let y' be 
the estimate for the transformed problem: 


y, = log y; = log re™ = logr + qx; 
Letting r’ = log r, we have the standard simple linear regression model 
y =@qtr'’ (25) 
That is, the logarithm function (24) transforms exponential curves into 
straight lines. We can apply the formulas in (19) to determine g and 


r’ from the transformed data and insert these into (23) to get a model 
in the original coordinate system. We obtain 


g = 4, p=a-1L4 GF =e =? 


The curve y = 4e*/* is plotted in dashed lines in Figure 4.10a. & 
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This technique can be used any time that there is a mapping that 
transforms the proposed regression model, such as (23), into a simple linear 
regression model. A statistician will test out several different regression 
models and settle on the one that gives the best fit (e.g., that minimizes the 
sum of the squares of the errors). 

Another way to perform nonlinear regression is to fit the y values to a 
polynomial in x, such as y; = ax} + bx? + cx, + d, where we treat x}, 
x?, and x, like the three distinct variables, v,;, w,, and x,. We cannot solve a 
problem with several variables on the right yet, but we will come back to 
least-squares polynomial fitting in Section 5.3. 


Optional 


We conclude this section with a vector calculus derivation of the general 
multivariable regression model in which we allow y to be a linear function 
of several input values. The same results will be obtained in Section 5.3 
more simply using vector space techniques. 

To be concrete, we consider the following model for ¥;: 


Yi = MY, + Gow; + 93x; + 9 (26) 


In matrix notation, we write 


A 


¥= qv + gw t+ g3x + ri (27a) 
= Xq (27b) 


where q = [g), G2, g3, r]) and X = [v w x Il]. 
Now let us compute SSE and its derivative in terms of (27a). Later 
we do it in terms of (27b). 


SSE = (q,\v + q.w + q3x + rl — y) (28) 
(4,4 + gow + 93x + rl — y) 


By the informal vector calculus used previously, 


dSSE 
“dae = 2y (9,V + GoW + G3X + rl — y) (29) 
] 


The derivatives with respect to g, and qg, and r will be similar. Multiplying 
v by the vectors in the parentheses in (29) and setting the result equal to 0, 
we obtain 


AVON PGW Aaa X + TVs = eg (30) 
Three other similar equations will be obtained from the other three deriva- 


tives. This gives us four equations in the four unknowns q,, q>, g3, andr 
that can be solved by Gaussian elimination. (Recall that it was this system 
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of equations of estimating regression unknowns that forced Gauss to develop 
Gaussian elimination.) 


Let us indicate how (30) and its sister equations can be obtained as a 
matrix equation. We write the SSE using (27b): 


SSE = (y — y)*(¥ — y) = (Xq — y): (Xq — y) (31) 

= Xq:Xq — 2Xq:y + yry 
By matrix algebra Xq + y = X’y < q (see Exercise 14). Then by vector 
calculus, the derivative of X’y - q with respect to q is X’y. By more ad- 


vanced vector calculus, the derivative of Xq - Xq is 2(X’X)q. Thus 


dSSE 
dq 


= 2(X’X)q — 2X’y (32) 
Setting (32) equal to 0, we obtain the famous normal equations of regression. 
X’Xq = X’y (33) 

whose solution, using inverses, is 
q = (X’X)"'X’y (34) 
The matrix expression (X7X)~'X? is called the pseudoinverse of X, since 


it allows us to solve (approximately) the system Xq = y. Pseudoinverses 
are discussed in Section 5.3. 


' Section 4.2 Exercises 


Summary of Exercises 


Exercises 1-8 involve regression models, Exercises 6—8 being nonlinear. 
Exercises 9—14 are theoretical. 


1. Seven students earned the following scores on a test after studying the 
subject matter for different numbers of weeks: 


Student A Bo DD £ FG 


Length of Study 


Test Score 5° We WF. BGs UR O68 30 


(a) Fit these data with a regression model of the form ¥ = gx, where 
x is number of weeks studied and y is the test score. Plot the 
observed scores and the predicted scores. What is the sum of 
squares of errors? 
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(b) Fit these data with a regression model of the form y = gx + r. 
Plot the observed scores and the predicted scores. What is the sum 
of squares of errors? 

(c) Repeat the calculations in part (b) by first shifting the x-values to 
‘make the average x-value be 0 [see equations (21)). 


2. The following data indicate the numbers of accidents that bus drivers 
had in one year as a function of the numbers of years on the job. 


Years on Job ened. . ee 1B Ee 
Accidents a SS ok Ae SD 
(a) Fit these data with a regression model of the form y = gx, where 
x is number of years experience and y is number of bus accidents. 
Plot the observed numbers of accidents and the predicted numbers. 
What is the sum of squares of errors? 
(b) Fit these data with a regression model of the form ¥ = gx + r. 


Plot the observed numbers of accidents and the predicted numbers. 
What is the sum of squares of errors? Is this model significantly 
better than the model in part (a)? 

(c) Repeat the calculations in part (b) by first shifting the x-values to 
make the average x-value be 0 [see equations (21)]. 


3. (a) Reverse the roles of y and x in Exercise 2—now y is number of 
years of experience—and fit the regression model y = gx + r to 
these data. Plot the observed years experience and the predicted 
numbers. What is the sum of squares of errors? 

(b) Compare your results with those in Exercise 2, part (b) or (c)— 
why are the numbers not the same? 


4. (a) The following data show the GPA and the job salary (5 years after 
graduation) of six mathematics majors from Podunk U. 


Salary 25,000 38,000 28,000 35,000 30,000 32,000 


Fit these data with a regression model of the form y = gx + r. 
Plot the observed salaries and the predicted salaries. Is the regres- 
sion fit reasonably good? 

(b) Repeat the calculations in part (a) by first shifting the x-values to 
make the average x-value be 0 [see equations (21)]. 


5. Consider the following relationship between the height of a student’s 
mother and the number of F’s the student gets at Podunk U. 
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Mother’s Height 
Student (inches) Number of F’s 
A 62 2 
B 65 6 
€ 59 l 
D 63 4 
E 60 1] 
F 69 6 
G 63 | 
H 60 3 


(a) Determine g and r in the regression model y = gx + r (where x 
is mother’s height and y is number of F’s). 

(b) Delete student E from your study and repeat part (a). Does deleting 
E make much of a difference? 

(c) Repeat the calculations in part (b) by first shifting the x-values to 
make the average x-value be O [see equations (21)]. 


. Consider the following set of data, which are believed to obey (ap- 
proximately) a relation of the form y = gx: 


x-value ie ee 4 a» § 7 


y-value a Le DSSS Fee Sar BH 


Perform a transformation y' = f(y) on y so that the regression model 


y’ = q'x is fairly accurate. Then determine gq’, reverse the transfor- 
mation to determine g, and plot the curve ¥ = qx°. 


. Consider the following set of data, which are believed to obey an inverse 
relation of the form y = g(1/x). 


Experience (x) 


Number of Accidents (y) Re, NOW eet: Te’ Se 


Perform a transformation y’ = f(y) on y so that the regression model 
y’ = q'x is fairly accurate. Then determine q', reverse the transfor- 
mation to determine g, and plot the curve y = q(1/x). 


. Consider the following set of data, which are believed to obey a square 
root law y = qV x. 
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Age 6, 5 wm Go 2 % 40 


Strength 7 a2 (Bake. SF - 290 


Perform a transformation y’ = f(y) on y so that the regression model 
y’ = q'x is fairly accurate. Then determine g’, reverse the transfor- 
mation to determine g, and plot the curve y = gVx. 


9. Verify the calculation of the partial derivatives in (11). 


10. Verify (17). 


11. Verify the expression for r = r'’ — g 2 x,/n, where r’ and g (= q’) 
are the regression coefficients in the transformed problem [see (21)]. 


12. Show that the formula for g and r makes the regression line y = gx + r 


go through the point (x, y), where x is the average x-value and y is the 
average y-value. 


Hint: First shift the x-values so that x = 0. 


13. In vector notation, the sum of squares to be minimized in the model 
y = gx + ris SSE = (gx + rl — y): (gx + rl — y). Compute the 
vector derivative of this expression with respect to g and with respect 
to r [see (29)]. Show that the two derivatives are the same as the 
expressions in (17). 


14. Verify that Xq- y = X’y~-q. 


Linear Models in the 
Physical Sciences and 
Differential Equations 


The examples in this section deal with physical-science applications. Until 
two decades ago, the physical sciences were almost the only disciplines that 
used mathematics. In those days everyone who studied calculus also studied 
physics. Today the majority of American students who study calculus and 
related mathematics take little physics. For them, physical-science applica- 
tions of mathematics are very hard to follow, since these applications usually 
depend on a general familiarity with the physical problem being modeled. 
Further, because linear models play such a large role in the physical sciences, 
the right place to study them is in a physical science course where the 
mathematics and science are naturally integrated. On the other hand, students 
experienced with physical-science linear models can learn much by seeing 
how similar models are used in other disciplines. 
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For this reason, none of the models examined in depth in this book 
will come from the physical sciences. The linear models, such as Markov 
chains and growth models, that we shall repeatedly use to illustrate concepts 
are “‘neutral’’ models that are easily comprehended by all students. How- 
ever, for completeness in this section a few models based on basic physical 
laws will be presented. (Social scientists should consider this section as the 
book’s **College of Arts and Letters’ distribution requirement in science.’’) 

The same reasoning applies to differential equations. Students who 
will use differential equations in courses in their major should have a full 
course in differential equations. We will only sample the subject to see some 
basic ways that linear systems of equations arise in differential equations. 


eS RANG 
Example 1. Balancing Chemical Equations 


In a chemical reaction, a collection of molecules are brought together 
in the proper setting (e.g., in boiling water) and they rearrange them- 
selves into new molecules. In this process the number of atoms of each 
element is conserved. If the molecules put into the reaction have a 
total of 12 hydrogen (H) atoms, the resulting set of molecules must 
also contain 12 H’s. Consider the reaction in which permanganate 
(MnO,) and hydrogen (H) ions combine to form manganese (Mn) and 
water (H,O): 


MnO, + H—> Mn + H,O (1) 


where O represents oxygen. Let x, be the number of permanganate 
ions, x, the number of hydrogen ions, x, the number of manganese 
atoms, and x, the number of water molecules. To have the same num- 
ber of atoms in the molecules on each side of the reaction, we obtain 
the system of equations. 


H: X, = 2X, 
Mn: xX, =X; (2a) 
O: 4x, = X%4 
or 
X> mes 2X4 — 0 
x =) X3 —_ 0 (2b) 
4x, = X4 = 0 


Notice that we have four unknowns but only three equations. 
Let us solve this system using elimination by pivoting. We pivot 
on entry (2, 1) to obtain 


X> a 2% = 0 
x; —_. X3 = 0 
4x, ng X4 —_ 0 
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We do not need to pivot in column 2. Next we pivot on entry (3, 3) 


XX» —-2% = X, = 2x, 

XxX a axe = 0 or XxX; = 4X4 
— 

Se aH = 0 Xz > 4X4 


As vectors, the solutions in (3) have the form 


[3x;, 2X4, 2X, Xai Or x,[4, Z; . 1] (3) 


For example, if x, = 4, then x, = 8, x, = x; = 1, and the reaction 
equation becomes 


MnO, + 8H — Mn + 4H,0 


The solution we obtain makes the amounts of the first three types 
of molecules fixed ratios of the amount of the fourth type, which are 
free to give any value (i.e., x, 1s a free variable). This makes sense in 
physical terms, since doubling our chemical “‘recipe’’ of inputs should 
just double the output. 

In another series of pivots, say at entries (1, 2), (2, 3), and 
(3, 4), we get x, as the free variable. 


—3X, + 2X = 0 x, = 8x, 
az) + X = 0 or X,= X, 


yielding solution vectors 


[x,, 8x), x), 4x] or x,[l, 8, 1, 4] (4) 
In this solution the values of the last three variables are fixed ratios of 
the first variable. t 


Example 2. Currents in an Electrical Network 


In this example we compute the current in different parts of the elec- 
trical network in Figure 4.1la. There are three basic laws that are used 
to analyze simple electrical networks. The following review of ele- 
mentary physics summarizes the concepts behind these laws. A battery 
or other source of electrical power ‘‘forces’’ electricity through elec- 
trical devices, such as a light or a doorbell. The force applied to a 
device depends on two factors: (i) the resistance of the device—a 
measure of how hard it is to push electricity through the device; and 
(ii) the current, the rate at which electricity flows through the device. 
The fundamental law of electricity, due to Ohm, says 


Ohm’s Law: force = resistance X current. 
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Figure 4.11 (a) Electrical net- Battery 


work. (b) Associated graph. 


14 volts 


€} 


Doorbell 


€3 


4 ohms (b) 


(a) 


Force is measured in volts, current in amperes, and resistance in ohms. 
In terms of these units, Ohm’s law is 


volts = ohms X amperes 


A battery supplies a fixed force into a network. Batteries send their 
electricity out from a positive terminal and receive it back at a negative 
terminal. All the voltage (i1.e., force) provided by a battery is used up 
by the time the electricity returns to the terminal. This property of 
voltage is called 


Kirchhoff’s Voltage Law. In any cycle (closed path) in a network, 
the sum of the voltages used by resistive devices equals the voltage 
from the battery(ies). 


The final law says that current 1s conserved at any branch node. 


Kirchhoff’s Current Law. The sum of the currents flowing into any 
node is equal to the sum of the current flowing out of the node. 


Let us use these three laws to derive a set of linear equations 
modeling the behavior of currents in the network in Figure 4.1 la. Later 
we shall express each of Kirchhoff’s laws in the form of a matrix 
equation. 


Assume that the battery delivers 14 volts. Further let c, be the 
current flowing through the section of the network with the battery and 
the light, whose resistance is 1 ohm; let c, be the current through the 
section with the doorbell, whose resistance is 2 ohms; and let c; be 
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the current through the section with the motor, whose resistance is 
4 ohms. Currents flow in the direction of the arrows along these edges. 
By the current law, at node s we have the following equation: 


Note that at node t we get the same equation. Next we use the voltage 


‘law. Following the cycle, battery to light to doorbell to battery, we 
have 


voltage at light + voltage at doorbell = battery voltage (6a) 


We use Ohm’s law (voltage = resistance X current) to determine 
the voltages at the devices. The battery voltage is 14. So (6a) becomes 


Ic, + 2c, = 14 (6b) 
Following a second cycle, battery to light to motor to battery, we have 
lc, + 4c, = 14 (7) 


There is still a third cycle that we could use, node s to doorbell to 
node ¢ to motor (going against the current flow) to node s. If we go 
against the current flow, the voltage is treated as negative. So the 
voltage law for this cycle is 


2c, >] 4c, = 0 (8) 


Note that equation (8) is simply what is obtained when we sub- 
tract (7) from (6b). Intuitively, the third cycle is the net result of going 
forward on the first cycle and then backward on the second cycle. 

We need to solve the three equations (5), (6), and (7) in the three 
unknown currents, which we write as 


CS" Sa C3 —_ 0 
Gy 265 = 14 (9) 
Cy + 4c, = 14 


Solving by Gaussian elimination, we find 


c, = 6 amperes, C> = 4 amperes, c; = 2 amperes (10) 
i] 


A general analysis of currents in networks involves a combination of 
physics and mathematics. The critical mathematical problem is proving that 
there will always be enough different equations to determine uniquely all 
the currents. 
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Before leaving this problem, let us show how this problem can be cast 
in matrix notation. Kirchhoff’s current law can be restated in matrix form 
using the incidence matrix M(G) of the underlying graph (when batteries 
and resistive devices are ignored). Figure 4.11b shows the graph G associ- 
ated with the network in Figure 4.1 la. G contains two nodes s, ft and three 
edges é), €>, €; joining s and 1. In this graph each edge has a direction; the 
directions for G are shown in Figure 4.11b. The current c, in edge e; is 
positive if it flows in the direction of e; and negative if it flows in the opposite 
direction. 

Recall from Section 2.3 that M(G) has a row for each node of G and 
a column for each edge. In the case of directed edges, entry m; = +1 if 
the jth edge is directed into ith node, = —1 if the jth edge is directed out 
from the ith node, and = 0 if the jth edge does not touch the ith node. For 
the graph G in Figure 4.11b, M(G) is 


e| €> e 


_sf+1 -1 -1 (11) 
mea) = 3)" +1 al 


Kirchhoff’s current law says that the flow into a node equals the flow 
out of the node, or in other words, the net current flow is zero. At node s, 
this means that 


Cy —- Co — 6, = 0 (12) 


Observe that currents going into s are associated with edges that have +1. 
in row s of M(G), and currents going out are associated with edges that 
have — 1 in row s of M(G). If m, denotes row s of M(G) and ¢ is the vector 
of currents (c,, C3, c3), then (12) can be rewritten as 


.e= Q (13) 
This equation is true for all rows of M(G). Thus (13) generalizes to 
Kirchhoff’s current law: M(G)c = 0 (14) 


This result is true for any associated graph G. 

We can also define a special cycle matrix K(G) for G with a row for 
each cycle (closed path) of G and a column for each edge. Let r; be the 
resistance in the jth edge. Then define entry k; of K(G) = +r; if the ith 
cycle uses the jth edge traversing this edge in the direction of its arrow, —r; 
if the ith circuit uses the jth edge in the opposite direction of its arrow, and 
= 0 otherwise. For example, K(G) in Example 2 would be, with cycles 
listed in the order they were discussed, 


l 2 0 
K(G) = | 1 0 + 
0 z =4 
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As occurred in Example 1, some circuits in a graph are always redun- 
dant and there is a simple rule for picking a minimal set of cycles for K(G). 
If k; is the ith row of K(G), then k; - c will be the voltages used on the ith 
cycle. If f; is the voltage force of batteries on the ith cycle, then Kirchhoff’s 
voltage law becomes 


k,-c = f; (15) 
and if f is the vector of f,'s, we have 


Kirchhoff’s voltage law: K(G)c = f (16) 


Our current problem can now be stated: Solve the system of equations (14) 
and (16) for ec. 
The next four examples involve differential equations. A common 
mathematical model for many dynamic systems, such as falling objects, 
vibrating strings, or economic growth, is a differential equation of the form 


y(t) = a,y'(t) + apy(t) (17) 


where y(t) is a function that measures the “‘position’’ of the quantity, y’(r) 
denotes the first derivative with respect to ft (representing time), y(t) denotes 
the second derivative, and a, and a, are constants. The differential equation 
is called linear because the right side is a linear combination of the function 
and its derivative. Solutions of (17) are functions of the form 


y(t) = Ae* (18) 


where e is Euler’s constant, and A and k are constants that depend on the 
particular problem. This form of solution also works if higher derivatives 
are involved in the linear differential equation. 


Example 3. Differential Equation for 
Instantaneous Interest 


The simple differential equation 


y(t) = .10y(t) (19) 


describes the amount of money y(t) in a savings account after ft years 
when the account earns 10% interest compounded instantaneously. 
Recall that y’(t) is the instantaneous rate of change, or graphically, the 
slope, of y(t). Thus (19) says that the instantaneous growth rate of the 
Savings account is 10% of the account’s current value. 

Recall that the derivative of e“ is ke“. Let us try setting y(t) = 
Ae“ [as given in (18)]. Now (19) becomes 


kAe™ = .10Ae“ (20) 
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Dividing both sides of (20) by Ae“, we obtain k = .10. So the solution 
to (19) is 


y(t) = Ae™ (21) 


The constant A is still to be determined because to know how much 
money we shall have after ¢ years, we must know how much we started 
with. Suppose that we started with 1000 dollars. The starting time is 
t = 0. Thus we have (using the fact e* = 1) 


1000 = y(0) = Ae? = (22) 
Then the solution of (19) with y(0) = 1000 is 


y(t) = 1000e!% (23) 
& 


Example 4. Solving Second-Order Linear 
Differential Equations 


Consider the following differential equation that might describe the 
height of a falling particle in a special force field. 


y"(t) = 6y'(t) — 8y(t) (24) 

This differential equation is called a second-order equation because it 

involves the second derivative. To solve this equation, we also need 

to know the starting conditions, what are the initial height y(O) and the 

initial speed y'(0). Here the derivative y'(t) measures speed, that is, 
the rate of change of the height. Suppose in this problem that 

f(Q) = 100 and f'(0) = —20 (25) 


We solve this problem in two stages. First we substitute (18) for 
y(t) in (24). 


a Ae” = 6< Ae! — $Ae“ (26) 
Recall that the second derivative of e“ is k7e“". So (26) becomes 

k*Ae™ = 6kAe™ — 8Ae* (27) 
If we divide by Ae“, (27) becomes 


k? = 6k — 8 or ke — 6 +8 =0 (28) 
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Equation (28) is called the characteristic equation of the linear dif- 
ferential equation (24). The roots of (28) are easily verified to be 2 
and 4. So we have two possible types of solutions to (24). 


y(t) = Ae* and y(t) = A’e* 
The constants A and A’ can have any value and these solutions 
will still satisfy (24). In fact, it can readily be checked [see Exercise 


16, part (b)] that any linear combination of the basic solutions e*’ and 
e* is a solution. Thus 


y(t) = Ae* + A'’e* (29) 


is the general form of a solution to (24). The constants A and A’ depend 
on the starting values. From (25) we have 


100 = (0) = Ae® + A’e® (30a) 
—20 = y'(0) = 4Ae® + 2A'e° 
which simplifies to 
A+ A’ = 100 (30b) 
4A + 2A’ = —20 


In (30a), we obtain y'(0) by differentiating (29) and setting t = 0. 
Now we have our old “‘friend,’’ a system of two equations in two 
unknowns. We solve (30b) and obtain 


A = -110 and A’ = 210 (31) 
Substituting these values in (29), we obtain the required solution 


y(t) = —110e + 210¢” (32) 
& 


The calculations for any other second-order differential equation would 
proceed in a similar fashion: First substitute (18) in the differential equation 
to obtain the characteristic equation [as in (28)] and solve for its roots; then 
determine A and A’ from the pair of equations for starting values [as in (30)]. 
This method generalizes to kth-order differential equations; then the char- 
acteristic equation has k roots, we need k initial values, and we have to 
solve k equations in k unknowns. 

For completeness, we note that if the two roots of the characteristic 
equation (28) were the same, such as 2 and 2, then the starting value equa- 
tions cannot be solved, since the two equations of (30a) will be the same. 
In the case of identical roots of the characteristic equation, y(t) instead has 
the form 


y(t) = Ae”™ + A’te”™ (33) 
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where r is the double root. It is an exercise to check that in the case of a 
multiple root (and only then), te” is also a solution to the differential equa- 
tion. 


Example 5. A System of Differential Equations 


Let us consider a pair of first-order differential equations which de- 
scribe motion of an object in x-y space with one equation governing 
the x-coordinate and one the y-coordinate. 


x(t) = 2x(t) — yt) © (34) 
y(t) = —x(t) + 2y(t) 


The starting values are x(0) = y(O) = 1. Let u(t) be the vector function 
u(t) = (x(t), y(t)). Then (34) can be written in matrix notation as 


u'(t) = Bu(z), where B = i ei (35) 


and u(Q) = [1, 1] = 1. In Example 3 we saw that 
y'(t) = by(t), yO) = A — y(t) = Ae” (36) 
Substituting B for b and I for A in (36), we obtain the solution to (35): 


u(t) = e*] (37) 
A) 


A matrix in the exponent looks strange. But one definition of e* is in 
terms of the power series. 


2 3 k 


x x xX 
Ry SAE a ag oe ates gels (38) 
Similarly, e® is defined 
B2 B? B* 
: a —_—— —_—— yey 4) — Pa ye | 
ES PEN SR ay iy (39) 


The power series (39) is well defined for all matrices. Although this 
power series of matrices may look forbidding, it is easy to use if we work 
in eigenvector-based coordinates so that B and its powers act like scalar 
multipliers (as in Bu = du). Recall Theorem 5 of Section 3.3, which said 
that if U is a matrix whose columns were different eigenvectors of B and if 
D, is a diagonal matrix of associated eigenvalues, then 


B = UD,U-! (40) 
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Substituting with (40) for B in (39), we obtain 


UD? U-' | UD} U-! 


ar =f. 2a 
e* = 1+ UD,U bi 7 
i 
~ om? A (41a) 
k! 
- Di , Di D* b 
-u(r+ + eee Basu | (41b) 
et ag’ @ 
fj Fr 8 
= Yery-' = 0 oO @&: U-’ (4c) 


The reason that e® turns out to be simply a diagonal matrix with diagonal 


entries e*', e?, .. . , e* is that in (41b), the matrices D{/k! are diagonal 
with entries \{/k!, \S/k!, . . . and summing these matrices we get a matrix 
whose entry (1, 1) is 1 + A, + A?/2! + AZ/3! + ..., which equals 


e™!; and similarly for the other diagonal entries. 


SETS LTS 

Example 6. Converting a Second-Order 
Differential Equation into a Pair of 
First-Order Differential Equations 


Let us consider again the second-order equation from Example 4: 
y(t) = 6y'(t) — 8y(t) (42) 


We convert (42) into a pair of first-order equations by introducing a 
second function x(t) defined 


x(t) = y'(t) and thus x(t) = y(t) (43) 
Now (42) can be written as the pair of the first-order equations 


x'(t) = 6x(t) — B8y(¢) (44) 
y(t) = x(0) 


Defining the vector function u(t) = [x(t), y(t)], we have 


u'(t) = Bu(r), where B = |‘ 4 (45) 


with initial conditions from Example 4 of u(Q) = [—20, 100]. 
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As in Example 5, the solution to (45) should be 
u(t) = e®u(0) (46) 
where e” is defined by the power series 


7B? PB? *BS 
it’ cas a catia Srelun soutné a 
e* = 1+ 1B + T + T + + i + (47) 


Remember that we already know from Example 4 the solution 
of (42), so u(t) in (46) must equal 


im x(t)| — | —440e* + 420e* (48) 
Tye] | —110e* + 21067" 


where x(t) = y'(t) = —440e* + 420e” is obtained by differentiating 
the solution for y(Z). 

We now use the eigenvector-coordinates approach from Sections 
2.5 and 3.3 to show how the intimidating formula for u(t) in (46) is 
the same as (48). Since by (47) e® involves powers of B, the com- 
putation of multiplying e®' times u(0) will be simplified if we express 
u(O) in terms of B’s eigenvectors. 

In Section 3.1 we learned how to find the eigenvalues of a matrix 
B—they are the roots of the characteristic polynomial det (B — AIT— 
and from them, the associated eigenvectors. The characteristic poly- 
nomial for B is X27 — 6A + 8 and its roots are 4 and 2. Eigenvectors 
u,, u, of B associated with the eigenvalues 4 and 2 are (Exercise 13): 


u, = [4, 1] ford, = 4 u, = [2,1] ford, = 2 


Writing u(0) = [—20, 100] as a linear combination of u, and u, (we 
must solve the system u(0) = au, + bu, for a and b (see Section 2.5 
for details), we obtain 


u(0) = {[—20, 100] = —110u, + 210u, (49) 
We now can compute e®‘u(0), which we rewrite using (47) as 


u(t) = e*u(0) 


Iu(0) + :Bu(0) + 5; B’u(0) ee (50) 


‘Mee 
+ 7 ae) tees 


Substituting u(0) = —110u, + 210u, in (50), we have 
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u(t) = 1(—110u, + 210u,) + tB(—110u, + 210u,) 


t? 
+ = BX —110u, + 210u,) + - + 


t* 
+ y B(— 110u, + 210u,) +--- 


t? Row. 
= 110} “+ tBu, + >; Bru, = + 7 Blu, 6) 2 | 


t? 


+ 210} Im, + tBu, + rT 


k 
Bate + Spy +] 
(51) 


But since u, and wu, are eigenvectors, the term B*u, equals 4*u,, and 
Biu, = 2ku,. So (51) becomes 


t*4? t*4k 
0) = —HOfm +m + Ey ee a ed 
0 
174? t*4h 
= -nof ee SE ele 


atten ee 
oO ee 


= —110e%u, + 210e”u, 


4 2 
— 110e* | + 210e”! 2 


A Reese ~ al 


—110e* + 210?! (52) 
Observe that for a different starting vector u*(O), we would get 
u*(0O) = a’u, + b’u,, for some a’, b’ and then the result in (52) 
would be u(t) = a’e“u, + b’e*u,. 
We now give a shorter derivation of this result using the matrix 
formula in (41) to handle the exponential series. For e®', (41) becomes 


e® = UeP'U! (53) 


where e?™ is a diagonal matrix with diagonal entries e*’. Recall that 
U has eigenvectors u, and u, as its columns. Thus 


4 2 é ¢ =1 
U = 4 and we compute U7! = ; (54) 
aie Ce 
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Using (53) to substitute for e¥’ in u(t) = e®u(O), we obtain 


u(t) = e®u(0) = UePU-'u(0) 
=) | aes Fa 
0 e~||-4 2]] 100 
hark oe ees 
i gt Des 
a +g 210 
is — 4400" + nica 
—110e + 210¢* 


The conversion of (42) to a system of first-order differential equations 
can be applied to any linear higher-order differential equation. 


Example 7. Converting a Third-Order Differential 
Equation into a System of Three 
First-Order Differential Equations 


Consider the third-order linear differential equation 
y(t) = y@) + 2y'@) + 3y() (56) 
We introduce the two new functions w(t) and z(t): 
wt)=yit) and 2t)=with [=y] 67) 
Then (56) can be written 
z(t) = z(t) + 2w(t) + 3y(t) (58) 


Defining the vector function u(t) = [ y(t), w(t), z(t)], we can rewrite 
(S57) and (58) as the matrix equation 


z'(t) 1 2 3)f 2s) 
u'(t) = Bu(t): wi(t)|} =11 0 Off w(t) (59) 
y(t) 0 1 Off yr) 


The solution to (59) is u(t) = e*u(0), which we would evaluate using 
eigenvectors, as discussed in Example 6. a 
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Section 4.3 Exercises 


Summary of Exercises 

Exercises |—4 are chemical reaction balancing problems. Exercises 5—7 are 
electrical circuit problems. Exercises 8-18 involve differential equations, 
with Exercises 16-18 being theory questions. 


1. Write out a system of equations required to balance the following chem- 
ical reactions and solve. Here C represents carbon, N represents nitro- 
gen, H represents hydrogen, and O represents oxygen. 

(a) NH, + N,O,— N, + H,O 
(b) C,H; + O, —~ CO, + H,O 


2. Write out a system of equations required to balance the following chem- 
ical reaction and solve. 


SO, + NO, + HO—> H + SO, + NO 


where S represents sulfur, N represents nitrogen, H represents hydro- 
gen, and O represents oxygen. 


3. Write out a system of equations required to balance the following chem- 
ical reaction and solve. 


PbN, + CrMn,0, > Cr,0; + MnO, + Pb,O, + NO 


where Pb represents lead, N represents nitrogen, Cr represents chro- 
mium, Mn represents manganese, and O represents oxygen. 


4. Write out a system of equations required to balance the following chem- 
ical reaction and solve. 


H,SO, + MnS + As,Cr,,03,;— HMnO, + AsH, + CrS,0,, + H,O 


where H represents hydrogen, S represents sulfur, O represents oxygen, 


Mn represents manganese, As represents arsenic, and Cr represents 
chromium. 


5. Determine the currents in each branch of the following circuit. 


10 volts 
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6. Determine the currents in each branch of the following circuits, in which 
the incoming amperage (on the left) is given. 


(a) (b) 


7. Determine the currents in each branch of the following circuit. The 
voltage in the battery is 19. 


8. Solve the following first-order differential equations, with given initial 
values. 
(a) y(t) = .Sy(t), yO) = 100 
(b) y(t) — 4y(t) = 0, yO) = 10 


9. Suppose that a population of bacteria is continuously doubling its size 
every unit of time. Write a differential equation for y(t), the size of the 
population. 


10. Solve the following second-order differential equations, with given ini- 


tial values. Use the method based on the characteristic equation (see 
Example 4). 

(a) y"(t) = Sy'(t) — 4y(t), yO) = 20, y’(0) = 5 

(b) y(t) = —Sy'() + 6y(2), yO) = 1, yO) = 15 

(c) y(t) = 2y"(t) + 8y(t), yO) = 2, y'(O) = 0 


11. Convert the following differential equations into systems of simulta- 


neous first-order differential equations. Do not solve. 
(a) y(t) = Sy'@) — 4y(t) 

(b) y"(t) = —Sy'(t) — 6y(t) 

(c) y(t) = 4y") + 3y') — 2x) 

(d) y"(t) = 2y(t) + y(t) 


Z 
12. Check that 3 and | are the eigenvalues for B = Ee 1 in Ex- 


ample 5 and that associated eigenvectors are [1, —1] and [1, 1]. Solve 


the system of differential equations in Example 5 using the method in 
Example 6. 
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| 6 
13. Check that 4 and 2 are the eigenvalues for B = |‘ 


6 and that associated eigenvectors are [4, 1] and [2, 1]. Also verify 
(49). 


: in Example 


14. Re-solve the second-order differential equations in Exercise 10 by con- 
verting them to a pair of first-order differential equations and solving 
by the method in Example 6 (you must find the eigenvalues and eigen- 
vectors). 


15. Solve the following pairs of first-order differential equations by using 
the solution technique in Example 6 (you must find the eigenvalues and 
eigenvectors). The initial condition is x(0) = y(O) = 10. 

(a) x(t) = 4x(t), y(t) = 2x(t) + 2y(t) 
(b) x’(t) = 2x(t) + y(t), y(t) = 2x(t) + 3y(t) 
(c) x'(t) = x(t) + 44), y(t) = 2x(t) + 3y(t) 


16. (a) Show that any multiple ry*(t) of a solution y*(t) to a second-order 
differential equation y"(t) = ay'(t) + by(t) is again a solution. 
(b) Show that any linear combination ry*(t) + sy®°(t) of solutions y*(f), 
y(t) to y"(t) = ay'(t) + by(t) is again a solution. 


17. Suppose that y°(t) is some solution to y"(t) — ay'(t) — by(t) = f(t) 
and y*(t) is a solution to y"(t) — ay'(t) — by(t) = 0. Then show that 
for any r, y(t) + ry*(t) is also a solution to y"(t) — ay'(t) — by(t) = 
f(t). 


18. Verify that y(t) = te™ is a solution to the differential equation y"(t) = 
cy'(t) + dy(t) whose characteristic equation k* — ck — d = (0) has Xd 
as its double root. 


Note: If k*> — ck — d = O has ) as a double root, the characteristic 
equation can be factored as (k — A)? = 0. This means that c = 2A 
and d = —hd*. Use these values for c and d in verifying that te™ is a 
solution. 


~ Section 4.4. Markov Chains 


Markov chains were introduced in Section 1.3. They are probability models 
for simulating the behavior of a system that randomly moves among different 
‘*states’’ over successive periods of time. If a Markov chain is currently in 
State §;, there is a transition probability a,; that 1 unit of time later it willbe 
in state S,. The matrix A of transition probabilities completely describes the 
Markov chain. If p = [p,, p>, . . ., p,,] 1s the vector giving the probabilities 
p; that S; is the current state of the chain and p’ = [pj, p3, . . - , p;,] is the 


304 Ch. 4 A Sampling of Linear Models 


vector of probabilities p’ that S; is the next state of the chain, then we have 
p' = Ap (la) 

For a particular p;, this is 
Pi = GP; + Appz + *** + AnD, (1b) 


Example 1. Frog in Highway Revisited 


In Section 1.3 we considered the Markov chain for a frog wandering 
across a highway that was divided into six states. The transition matrix 
A was 


Current State 


be ate Seo eg 
(P12 8 8.2 
ro 60 OF OF DO o 
_ Next 3 "@ 528) "50. 25 0 0 (2) 
State 616 -O 25° 56.95 0 
Sr <0 0. 850 250" 
so CO G8 B. sO 


We started with probability vector p = [1, 0, 0, 0, 0, OJ, that 
is, the frog started in state 1. We computed a table of the probability 
distributions after varying numbers of periods. In matrix notation, we 
computed 


p” = Ap 


for increasing values of k. We found that as k got large, p“ converged 
to the probability vector 


P= Pdoducckecies ce eel (3) 

which satisfied the equation 
p* = Ap* (4) 
Thus p* is a stable (unchanging) probability distribution for this Mar- 


kov chain. In matrix terminology, p* is an eigenvector of A with 
associated eigenvalue 1. 


In Section 3.5 we solved the eigenvector equations (4)—actually, 
we solved (A — I)p = 0—and obtained a general solution of the form 


[g, 2g, 2g, 2q, 24, q] (5) 


Making the components in (5) sum to | (to be a probability distribu- 
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tion), we obtained p* = [.1, .2, .2, .2, .2, .1]. So this p* is the 
unique stable distribution of the frog Markov chain. The fact was 
brought out in the Exercises of Section 1.3 that if the starting proba- 
bility vector p had been different, we still would have found that p“ 
converged to this p*. 

Suppose that the starting probability vector were the jth unit vec- 
tor e;, with a | in the jth position and 0’s elsewhere (the original p 
was e,). It was noted in Section 2.4 that for any matrix B, 


Be, = bf (b& = the jth column of B) 
Then 
p” = A*e, = (A‘)f — (the jth column of A‘) (6) 


Since p™ converges to p*, we conclude that the jth column of A‘ 
approaches p*, for & large, 


A‘ —> 


(7) 


HRN RD E 
~RRRNE 
HbR HNE 
H~RRRVE 
~ Wb & 


—~WNNNN ]| 


Does this property of any starting probability vector converging to a 
stable probability distribution hold true for all Markov chains? The answer 
is no. 


SOIT TST 
Example 2. Markov Chain Not Converging to 
Stable Distribution 


Consider the simple two-state Markov chain with transition matrix and 
starting vector 


a 
A= |‘ 4 and p = [1, 0] (8) 


It is easy to check that p“ = [0, 1] for k odd, and p“ = [1, 0] for 
k even. More generally, for a starting vector of p = [r, 1 — r] for 
any r,0 =r = 1, we have p™ = [1 — r, r] for k odd, and p® = 
[r, 1 — r] for k even. Note that p® = [.5, .5] is a stable vector (an 
eigenvector with eigenvalue 1), but this Markov chain will not con- 
verge to p®; if we do not start at p°, we never get to p®. 

The powers of A have a similar odd—even cyclic pattern, with 
A* = A for k odd, and A* = I for k even. 2 
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So now the question is: Under what conditions does a Markov transition 
matrix A have a stable probability vector to which any starting vector will 
converge, or equivalently, when will the columns in powers of A all con- 
verge to a given probability vector as in (7)? The question is not tied to the 
existence of an eigenvector with eigenvalue |, since in Example 2, [.5, .5] 
was such an eigenvector, but there was no convergence. Instead, the answer 


depends on the absence of any cyclic or other nonrandom pattern, as seen 
in Example 2. 


Definition. A Markov chain with transition matrix A is regular if for some 
positive integer h, the matrix A” has all positive entries. 


The matrix in Example 2 was not regular. A regular Markov chain 
mixes, or randomizes, patterns so as to eliminate any cyclic behavior. If a 
Markov chain is regular, then every column of A“ has all positive entries, 
meaning that starting from state j it is possible after A periods to be in any 
of the states. The following theorem requires a lengthy, but not advanced, 
proof that may be found in any of the texts on Markov chains listed in the 
References. 


Theorem I. Every regular Markov chain with transition matrix A has a 
stable probability vector p* to which p™ = A*p converges, for any 
probability vector p. All the columns of A* also converge to p*. 


One way to find the stable distribution of a regular Markov chain is, 
as done in Section 3.5, by solving (A — I)p = 0 and then picking the 
constant in the solution to make the components sum to | [see equation (5)]. 
Another approach is to add the additional constraint 1- p (= % p;) = 1. 


(A — Dp = 0 (9) 
I-p=I1 


This is a set of nm + 1 equations in n unknowns. 


Example 3. Solving for Stable Distribution 


Consider a simpler Markov chain which involves just two states that 
represent two islands, isle 1 and isle 2, in an isolated country. We are 
interested in the flow of money between these two islands. Assume 
that no money enters or leaves the country. Then a Markov chain 
should provide a reasonable model for currency flow. We shall perform 
a general analysis of this model rather than use specific values for the 
transition probabilities a; Since columns must sum to 1, the transition 
matrix A can be written in terms of the off-diagonal entries thus: 


Current State 
kK 2 


4 = Next! 1-b a 
~ State2] »b Vetere 
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We require 0 < a, b < | so that the Markov chain will be regular. 
Let us solve (9) for this A. 


(A — Dp = 0: —bp, + ap, = 0 
rip = 3: yt B= 1 


The second equation in (10) is just the first equation multiplied by — 1. 
So the second equation is redundant, and we are back to the standard 
situation of two equations in two unknowns. When solved, they yield 
the stable distribution 


Pi (11) 


cr ae 


Note that (11) will always be well defined unless a and b are both 0, 
in which case we get the trivial Markov chain: p; = p,, Pp; = po. & 


For any Markov transition matrix, the system (A — [Ip = 0 always 
has redundancy because the sum of the right-hand sides of all the equations 
is 0 (A has column sums of 1, but the —I term makes the column sums of 
A — I equal to 0) or, eqivalently, the last row is minus the sum of all the 
preceding rows. When such redundancy exists, the last row will be zeroed 
out in Gaussian elimination (the reasons for this are discussed in Section 
5.2). Thus the last row can be replaced by the constraint 1+ p = 1, as 
implicitly happened in Example 3. 

However, Gaussian elimination is so simple in tridiagonal systems, 
like the frog Markov chain, that adding this new row creates as much trouble 
as it saves. We illustrate the advantage of a tridiagonal matrix with the 
following large-scale example. 


FEAT 
Example 4. An n-State Frog Markov Chain 


Let us generalize the frog Markov chain to a chain with n states, where 
n is an arbitrary number. The system of equations (A — Dp = 0 is 


— S50p, + .25p, = 0 
0p, — .d0p, + .25p, = 0 
29p, — .SOp, + .25p, = 0 

(29)... = Op, 5 HF 6250 = 0 

LIP 4 > DUP, Ft 0p, = 0 

29), —-, — -0p, = 0 


(12) 
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Performing elimination in the first column requires us simply to add 
the first equation to the second. 


— 200, 4 2D " = 0 
= Zp HF aon = 0 

Lipo — SOp, + .Z23p, = 0 

Op, 4, = Dg 4 + 29P ea = 0 

2D, = OD Sp, = 0 

2.) He, = 


Similarly, for elimination in the second column we add the second 
equation to the third. 


— .50p, + .25p, = 0 
= .20ps + .23D = 0 

= 2903 + 25D, = 0 

p34 — 0p, + Zaps = 0) 

2, 4 = ONO, o5 2D, = 0 

23p,.3 — 0p,—, + .S0p, = 0 

20p,-; — 0p, = 0 


The situation when we come to perform elimination in the third column 
is the same as in the second column and again involves adding the 
third equation to the fourth. This situation will stay the same for every 
column from the second through the (n — 1)st. After elimination in 
the first m — 1 columns, we have 


—.50p, + .25p, = 0 
—— .25)3 + 25), = 0 


i) 


a 2: JOE Wi ae ae) | Se = 0 

= .25p._, + .0p, = 0 

0 = 0 

(13) 

Note that the (n — 1|)st equation in (13) is the negative of the original 


last equation, so the last equation is zeroed out when we perform 
elimination in the (n — 1)st column. For concreteness, the reader may 
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want to refer back to Example | of Section 3.5, where we solved this 
system for the original frog Markov chain, where n = 6. 

Letting p,, = g, we perform back substitution and find from the 
(n — 1)st equation in (13) that .25p,_, = .50p, (= .50q), so 
P,—, = 2q. Equations 2 through n — 2 in (13) say that successive 
Pj; Pj+; pairs from p, through p,,_, are equal. Since p,,_, = 2q, all 
these p.’s equal 2g. Finally, we see that p, = g. So our solution has 
the form 


1G; 245 245. 2 <-4+5 245 205.01 


The sum of the entries in this general solution is (2n — 2)q. For this 
sum to equal 1, we require that q = 1/(2n — 2). So our stable 
distribution is 

I 2 2 2 2 I 
p* — , ’ ye Be ’ ’ 
Be ZR 2 Lee n— 2. 4m 1 ThA 


The effect of replacing the last equation in (12) by 1- p = 1 is 
discussed in the Exercises. a 


Next we consider an important type of nonregular Markov chain, called 
an absorbing Markov chain. A state 5; in a Markov chain is called an ab- 
sorbing state if a;, = |, that is, once you enter state S$; you never leave it. 


A Markov chain with one or more absorbing states is called an absorbing 
Markov chain. 


Example 5. A Gambling Model with 
Absorbing States 


Absorbing states complicate the behavior of a Markov chain and lead 
to a variety of different stable probabilities. Consider the following 
Markov chain for gambling, with states representing the gambler’s 
winnings. Each round, the gambler has a probability .3 of winning $1, 
.33 of losing $1, and .37 of staying the same. The gambler stops if 
he or she loses all of the money, and also stops if the winnings reach 
$6. So 0 and 6 will be absorbing states in this Markov chain. 


Current State 


i Aces tea aS ab 
oa a 33s 20. 0 oe A 
i1G@ “37 33 @ 6. 8 2D 

west 29 O37 33 0 0 OF 
= ei oO 0 30 37 33 6 0 
me alo 6. 0 230 7. 38 
oe o & 6 SO" 37)% 
ae Oh 8 OO 
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After playing a very long time, a person is certain to have 
stopped, either having gone broke or having won. Thus the stable 
probabilities should only involve the absorbing states 0 and 6, that is, 
the stable vector p* has the form 
* 
4 


Po. Ps Pe = lp, and p) = pz =p) = py = Ps = 0 


By looking at the transition probabilities for states 0 and 6 alone, 


Current 
State 
QO 6 


Next 0 mm. 
State 6 Oo ey 


it is easy to see that any p* of the form in (15), withO =p = 1, isa 
stable vector. 

Before doing any mathematical analysis of absorbing Markov 
chains, let us explore the behavior of (14) by letting a computer pro- 


Table 4.2 


Probability Distribution After k Rounds 


Rounds 0 I 2 3 + 5 6 
0 0 0 0 l 0 0 0 
] 0 0 33 37 30 0 0 
2 0 109 244 Re Bs 2 222 09 Q 
3 036 121 234 .270 212 100 027 
4 076 128 212 .240 193 10] 057 
5 116 115 194 216 177 095 087 
6 154 107 178 .196 161 O88 116 
8 222 090 149 164 135 074 166 
10 278 075 125 .137 113 062 209 
15 383 048 O80 .088 072 040 288 
20 451 031 051 .056 047 026 336 
25 494 020 033 .036 030 016 37] 
50 563 002 003 .004 003 002 423 
75 570 ~0 ~~) =) ~0 =~) 428 
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gram produce a table of probability distributions over many rounds 
when we start with $3. Since $3 is halfway between losing and win- 
ning, we could expect the chances of losing to be close to .5, but a 
little above .5, since there is always .33-versus-.30 bias toward losing 
a dollar in any state. 

So if we start with $3, the probability of eventually losing (before 
we reach $6) is p = .571. Instead of repeating this computer simulation 
for other starting values, we shall now develop a theory that lets us 
calculate directly the probability of losing or winning, when we start 
with different amounts of money. = 


The first step in our development is to divide the states of an absorbing 
Markov chain into two groups, the absorbing states and the nonabsorbing 
states. Assume that there are r absorbing states and s nonabsorbing states. 
If the absorbing states are listed first, the transition matrix A can be parti- 
tioned into the form 


Ab NAb 
ree, tide (16) 
~ NAb|O Q 


where I is an r-by-r identity matrix, O is an s-by-r matrix of 0’s, R is an 
r-by-s matrix with entry r;; giving the probability of going from nonabsorbing 
state j to absorbing state 7, and Q is the s-by-s transition matrix among the 
nonabsorbing states. The transition matrix (14) in. Example 5 becomes 


eo a oe ee ea 
B) PbO: 135) PPIs 0 LD 
so 1}o 0 0 0 30 
WO. O37) 23h Die O 
~=4 10 04.30 37 33 0 0 a”) 
S10-0h8 300537 33.50 
410 0} 0 : 30 3T- 8 
5.10 C410 2.0 30) 39 


Using the rule for matrix multiplication of a partitioned matrix from Section 
2.6 (just treat the submatrices like individual entries), we have 


w=|t alle el tas oe ost 

O QILO Q 10 + OQ OR+ QQ] ag 
_ [1 R+RQ 
lo“ 


and multiplying A times A’, we find that 
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ey pai: Ad | 
=l6 as = 


SD Oooo Oollo = 


It is left as an exercise to check that for higher powers of A, the partitioned 


form is 
weft 
0 Q (19) 


where Rf = R + RQ + RQ? +--+ + RQ‘! 


Note that Q* is the standard kth power of the nonabsorbing transition 
matrix Q, for play among the nonabsorbing states. 

As k gets large, the entries in Q* will approach 0, since over time the 
probability of not getting absorbed approaches 0. The important submatrix 
in (19) is Rf. Entry (i, j) of R¢ is the probability of being in absorbing state 
i after k rounds if we start in nonabsorbing state j. Let us explain what this 
probability is in detail. To go from nonabsorbing state 7 to absorbing state 
i after kK rounds, we can either go immediately on the first round from / to 
i—with probability r,,—(and remain in absorbing state i), or we can wander 
among the nonabsorbing states for several rounds, ending up after w rounds 
in nonabsorbing state h—with probability given by entry (j, 4) in Q”’—and 
then go from state h to absorbing state i—with probability r;, (and thereafter 
remaining in state 7). The total probability of starting in a nonabsorbing state 
j, wandering among nonabsorbing states for w rounds, and then going from 
some nonabsorbing state to absorbing state 7 is given by entry (i, j/) in RQ”. 
Since the number w can range up to k — 1, we obtain the sum for R? given 
in (19). 

The limiting matrix R* for R¢ as k approaches infinity will give the 
probabilities r¥, that starting in nonabsorbing state 7 we eventually end up 
in absorbing state i. 


R* = R + RQ + RQ? +:--=RI+Q+Q +--+) (20) 
R* is the matrix that would tell us in the gambling model the probability of 
eventually losing or winning, when we start with different amounts. 

There are two ways to compute R*. The first way is to compute Q* 


for all k up to some large number, say 50, and sum these matrices and 
multiply by R to obtain R* as in (20). The other way rewrites R* as 


R* = RI+ Q+Q?7+-:-) =RdI—- Q)”' (21) 


using the geometric series identity introduced in equation (7) of Section 3.4. 
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We call (I — Q)~! the fundamental matrix of an absorbing Markov 
chain and use the matrix N to denote it. 


N=1+Q0+Q@++-:-:-:=1-Q' and R* = RN (22) 


We can calculate this inverse using elimination by pivoting. N is given 
in (24). 

The geometric series identity used in (21) required that ||Q\| < 1 for 
some matrix norm. The sum norm ||Q(|, (largest column sum) of Q in (17) 
is 1, and the max norm is > 1. However, ||Q|| < | in the euclidean norm. 

The matrix N contains some very useful information by itself. It tells 
us the expected number of times we will visit (nonabsorbing) state i if we 
start in (nonabsorbing) state j7. The reasoning is as follows. The average 
number of times we visit state i starting from state j after exactly one round 
is simply O(1 — q;,) + 1g, = 9,—the weighted average of visiting state i 
zero times and of visiting state i one time. The average number of times we 
visit state i starting from state j after exactly two rounds is O(1 — gq‘) + 
lq\? = qi, where gq‘? denotes entry (i, j) in Q’. The average number of 
visits after exactly k rounds is entry (i, j) in Q*. Probability theory states 
that the average number of visits from state j to state 7 totaled over all rounds 
is simply the sum of the average number of visits on each specific round. 
So the expected number of times we visit nonabsorbing state i starting from 
nonabsorbing state j is the sum of the (i, /) entries in Q* for all Q*, that is, 
entry (7, j) in N. 

Furthermore, if we sum the entries of the jth column of N—the ex- 
pected number of times, starting from state j, that we visit state 1 plus the 
expected number of times we visit state 2, and so on—we obtain the expected 
number of rounds until we are absorbed. The vector-matrix product IN 
computes the sum of each column of N. 

We summarize this wealth of information about absorbing Markov 
chains we can get from N with the following theorem. The term absorption 
is used in this theorem to mean going to an absorbing state. 


Theorem 2. Let N be the fundamental matrix of an absorbing Markov 
chain [N is defined in (22)]. Then the following are true. 

(i) Entry n,, of N is the expected number of times we visit the non- 
absorbing state i (before absorption) when we start in nonabsorb- 
ing state /. 

(ii) The jth entry in the vector IN gives the expected number of 
rounds before absorption when we start in nonabsorbing state /. 

(iii) Entry (i, j) in RN is the probability of eventually ending up in 
absorbing state i when we start in nonabsorbing state /. 


Example 5 (continued). A Gambling Model. 


With Theorem 2 we can answer a variety of interesting questions about 
this model. We must compute the matrix N by finding the inverse of 
I — Q, where Q is 
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l i: ale 
SH esd De OS wy 
We sab oo OG 
3 
a et on) |G 2°) 


0 . 
Pe a 5) a3 
ee 8 © 9330 .37 


Using elimination by pivoting (described in Section 3.3), we obtain 


Be 43) Leta, Beans 
PT 264. 227 493 Lot. eS 
12001 4.3) 330 331° 9231 

N = (I — Q)-! = 3] 1.43 3.00 4.73 3.30 1.73| (24) 
4) 91 1.91 3.00 4.21 2.21 
5| .43 .91 1.43 2.01 2.64 


From (24) and Theorem 2, we see that if we started with $3, 
there would be 3.3 rounds during an average gambling session when 
we would be in state 2 (when we would have $2). 

Next we sum the columns of N: 


IN = [7.42, 12.24, 14.19, 13.04, 8.42] (25) 


The third entry in (25) tells us that if we start with $3, we get to play 
about 14 rounds, on average, before the game ends. 

Finally, we compute R* = RN, where we see from (17) that R 
is 


i el er ae 
n-e|% 0. Fe 8 _ 
6180" & O43 (26) 
Ww Sine Oe) 
O87 73) C57 a0" I 
x — = 
ee ae 27 43 .60 a 


Entry (0, 3) of R* confirms our earlier simulation result that the prob- 
ability of going broke when we start with $3 is .57. | 


We close this section by noting that some of these results about ab- 
sorbing Markov chains can be applied to regular Markov chains with the 
following trick. Let A be the transition matrix of a regular Markov chain. 
We convert one state, say state p, into an absorbing state by replacing the 
pth column af of A by the pth unit vector e,. Now whenever we come to 
state p, we stay there. Our theory of absorbing Markov chains can be applied 
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to this modified transition matrix to determine the expected number of rounds 
it takes to get to state p if we start from any other state j (and also the 
expected number of visits in any third state on the journey from / to p). 

If we convert two states p and r into absorbing states with unit vector 
columns, then we compute the relative probability of reaching p or r first 
when we start from some other state j. These calculations are requested for 
Example | in the Exercises. 


le os St 
Example 6. Ice Cream Selection 


We surveyed a group of students eating blueberry, mint, and straw- 
berry ice cream about which flavor they would choose next time. Sup- 
pose that { of those eating blueberry would choose blueberry the next 
time, while the remaining quarter would choose strawberry. Responses 
from others yielded the following transition matrix: 


Current Flavor 
Blueberry Mint Strawberry 


Blueberry j 5 0 
Next ; 
Tj Mint 0 3 5 (27) 
ime 
Strawberry 1 0 3 


We treat the selection of flavors as a Markov process and pose 
the question: How many rounds does it take on average for a person 
to switch from strawberry to blueberry? 

To answer this question, we change the transition matrix in (27) 
by making blueberry an absorbing state. The modified transition matrix 


bm s§ 
bi1 4 0 
A= 0 ¢ 3% le “ with q = |} i (28) 
s|0 O % 
Then we find that 
t =i >” f283 
= = -Il — — 
vea-or-fh BE 
and 
IN = [2, 5] 


By Theorem 2, part (ii), the second entry, 5, in IN is the average 
number of rounds until absorption (blueberry) when starting from 
strawberry. Moreover, by Theorem 2, part (i), the second column of 
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N tells us that on average a person starting with strawberry would 
choose strawberry three times (including the initial time) and choose 
mint twice before choosing blueberry. a 


With modest effort (see any textbook on Markov chains in the Ref- 
erences), one can also prove the following interesting result. 


Theorem 3. Let p* = [p*%, p%, . . . , p*] be the stable probability vector 
of a regular Markov chain. Then, if we start in state 7, the expected 
number of rounds before we return to i again is 1/p*. 


Section 4.4 Exercises 


Summary of Exercises 

Exercises 1-14 concern regular Markov chains, their stable distributions, 
and long-term behavior. Exercises 15-25 involve analysis of absorbing 
Markov chains. 


1. Describe the behavior of the Markov chain 
oO F.C 
Do 4 
i © 0 
with starting vector [1, 0, 0]. Are there any stable vectors? 


2. Which of the following transition matrices belong to regular Markov 
chains? Find a stable distribution for each chain. 


Qo } . | 4 10 
stl oT w) |5 | ()]0 01 
1 9 0 


3. Compute the stable distribution for the weather Markov chain intro- 
duced in Section 1.3 with transition matrix 


Sunny Cloudy 


Sunny 4 3 
Cloudy | 4 5 


4. The printing press in a newspaper has the following pattern of break- 
downs. If it is working today, tomorrow it has 90% chance of working 
(and 10% chance of breaking down). If the press is broken today, it 
has a 60% chance of working tomorrow (and 40% chance by being 
broken again). Compute the stable distribution for this Markov chain. 
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5. If the local professional basketball team, the Sneakers, wins today’s 
game, they have a 3 chance of winning their next game. If they lose 
this game, they have a 4 chance of winning their next game. Compute 
the stable distribution for this Markov chain and give the approximate 
values of entries in A'°°, where A is this Markov chain’s transition 
matrix. 


6. If the stock market went up today, historical data show that it has a 
60% chance of going up tomorrow, a 20% chance of staying the same, 
and a 20% chance of going down. If the market was unchanged today, 
it has a 20% chance of being unchanged tomorrow, a 40% chance of 
going up, and a 40% chance of going down. If the market goes down 
today, it has a 20% of going up tomorrow, a 20% chance of being 
unchanged, and a 60% chance of going down. Compute the stable 
distribution for the stock market. 


7. Write down a Markov chain to model the following situation: Assume 
that there are three types of voters in Texas: Republicans, Democrats, 
and Independent. From one (national) election to the next, 60% of 
Republicans remain Republican and similarly for the two other groups; 
among the 40% who change parties, 30% become Independent and 10% 
go to the other major party, except that the Independents who change 
all become Republicans. Determine the stable distribution among the 
three parties and from it give the approximate values of entries in A'!?”. 


8. (a) Make a Markov chain model for a rat wandering through the fol- 
lowing maze if, at the end of each period, the rat is equally likely 
to leave its current room through any of the doorways. (It never 
stays where it Is.) 


(b) What is the stable distribution? 


9. Repeat the questions in Exercise 8 for the following maze. 
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Find the stable distribution for Markov chains with the following tran- 
sition matrices. 
¢ $50.00: 6° 6 22000 0 
24 19000 fay ae BD ae 
04340 0 0¢ ¢ 3 0 0 
(a) on ate (b) a ae 
O 0 & } ag 0 0 —¢ 6 3 O 
000 4 4 3 00024 4 2 
00004 3 00002 34 
2 #000 0 040000 
+ & £ ob Oo. 6 i 0 £6 0 0 
044300 040400 
(c) Shee (d) 
00 486 4% 0 G' G8 Oy 4.48 
000 8 4 4 Oo oO 4.6 3 
yo (OTe. 8 0 0 8 O.°+ 0 


Repeat Exercise 10, parts (a) and (d) with the number of states expanded 
from six to n, aS done in Example 4. 


Determine the two eigenvalues and associated eigenvectors for the Mar- 
kov chain in the following exercises. Give the distribution after six 
periods for the given starting distribution p by representing p as a linear 
combination of the eigenvectors as was done in the end of Section 3.1. 
(a) Exercise 3, starting p: Sunny 0, Cloudy 1. 

(b) Exercise 4, starting p: Working 1, Broken 0. 

(c) Exercise 5, starting p: Winning 3, Losing 4. 


Show that if A is a tridiagonal Markov transition matrix, then in solving 
(A — Dp = 0 by Gaussian elimination, the L in LU decomposition of 
A is a matrix with 1’s on the main diagonal and —1 just below the 
diagonal entries. That is, in Gaussian elimination one always adds the 
current row (times |) to the next row. 


Re-solve the stable distribution problems in Exercise 10 with the last 
row of the matrix equation (A — Ip replaced by the constraint 
1- p = | (the last row always drops out—becomes 0—and the addi- 
tional constraint that the probabilities sum to | can be put in its place). 


. The following questions refer to the gambling Markov chain in Example 


5. If you started with $4, what is the expected number of rounds that 
you have $3, and what is the expected number of rounds until the game 
ends? 


The following model for learning a concept over a set of lessons iden- 
tifies four states of learning: 7 = ignorance, E = exploratory thinking, 
S = superficial understanding, and M = mastery. If now in state /, 
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after one lesson you have 3 probability of still being in 7 and 4 proba- 

bility of being in E. If now in state E, you have 4 probability of being 

in J, } in E, and 4 in S. If now in state §, you have 4 probability of 

being in E, 4 in S, and 4 in M. If in M, you always stay in M (with 

probability 1). 

(a) Write out the transition matrix for this Markov chain with the ab- 
sorbing state as the first state. 

' (b) Compute the fundamental matrix N. 

(c) What is the expected number of rounds until mastery is attained if 

currently in the state of ignorance? 


(17) In the following maze suppose that a rat has a 20% chance of going 
into the middle room, which is an absorbing state, and a 40% chance — 
each of going to the room on the left or on the right. 


(a) What is the expected number of times a rat starting in room | enters 
room 2? 

(b) What is the expected number of rounds until the rat goes to the 
middle room? 


18. (a) Make a Markov chain model for a rat wandering through the fol- 
lowing maze if, at the end of each period, the rat is equally likely 
to leave its current room through any of the doorways. The center 
room is an absorbing state. (It never stays in the same room.) 


(b) If the rat starts in room 4, what is the expected number of times it 
will be in room 2? 

(c) If the rat starts in room 4, what is the expected rounds until ab- 
sorption? 
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Consider the game of Ping-Pong with the following states:, 


A: Player | is hitting the ball. 
B: Player 2 is hitting the ball. 
C: Play is dead because 1 hit the ball out or in the net. 
D: Play is dead because 2 hit the ball out or in the net. 


The transition matrix is 


(A hitting ball) | 
(B hitting ball) 2 
(A hit ball out) 3 
(B hit ball out) 4 


SD Af Ce 
~~ oo © 
ore CO SC WwW 
—oCcCo + 


If we start play with player A hitting the ball (in state 1) 

(a) What is the expected number of times player A hits the ball (before 
the point is over)? 

(b) What is the expected number of hits by A and B (before the point 
1s over)? 

(c) What is the probability that player A hits the ball out (i.e., that 
player B wins the point)? 


Repeat Exercise 18 for the following maze, in which rooms | and 5 
are absorbing. Start in room 2. 


Repeat the Markov chain model of a poker game given in Example 5 
but now with probability 3 that a player wins | dollar in a period, with 
probability 3 a player loses 1 dollar, and with probability 3 a player 
stays the same. The game ends if the player loses all his or her money 
or if the player has 6 dollars. Compute N, IN, and RN for this problem. 


Three tanks A, B, and C are engaged in a battle. Tank A, when it fires, 
hits its target with hit probability 2. B hits its target with hit probability 
3, and C with hit probability %. Initially (in the first period), B and C 
fire at A and A fires at B. Once one tank is hit, the remaining tanks aim 
at each other. The battle ends when there is one or no tank left. The 


transition matrix for this game is (the states are the subsets of tanks 
surviving) 
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ABC AC BC A B C None 
ABC |\7% 0 0000 O 
Ae ete fe 1. Oy @ 
BC ols C8 40-0 G8 @ 
ead: se. OO F 8-8 
Bok OO. te RL OY. eB 
Cas or oO kB 
None |0 wf t+ 000 1 


(a) Determine the expected number of rounds that the battle lasts (start- 
ing from state ABC). 

(b) What are the chances of the different tanks winning (being the sole 
surviving tank)? 


23. Compute A? and A‘, in partitioned form, for the partitioned matrix A 
in (17). 


24. Modify the frog Markov chain in Example | by making states | and 6 
absorbing states. Compute the probability, when started in state 3 of 
being absorbed in state |. Also compute the expected number of periods 
until absorption (in state 1 or 6). 


25. Modify the frog Markov chain in Example | by making state | an 
absorbing state. Starting from state 5, compute the expected number of 
periods until absorption and the expected number of visits to state 6. 


Growth Models 


In this section we examine three models for growing populations. We have 
already seen a simple linear model, introduced in Section 1.3, for the growth 
of two competing species, rabbits and foxes. Here we will study models for 
the growth of one species that is subdivided into different age groups. For 
simplicity we again let rabbits be the object of study in the models. However, 
our models apply to any renewable natural resource, from animals to forests, 
and to many human enterprises, be they economic or social. The first model 
has been applied to human populations to predict population cycles and to 
set insurance rates. 


F Age-Specific Population Model 


We want a model that breaks down a population into different age 
groups. Human population models commonly have about 20 age 
groups, with each age group spanning 5 years, plus a special group 
for the first year of life (since mortality rates for newborns are different 
from other young children) and a last group consisting of everyone 
past some advanced age, say 90 years. Each age group is really two 
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groups, one for men and one for women. The study of the sizes of 
human populations is called demography. 

To make matters simple, we will content ourselves here with a 
three-age-group model of rabbits. 


y = young rabbits, up to 2 years old 
= midlife rabbits, between 2 and 4 years old 
o = old rabbits, 4 to 6 years old 


Let one period of time equal 2 years. Our model for the next period’s 
population vector a’ = [y’, m’, o’] in terms of the current population 
a = [y, m, o| 1s ; 


y = 4m +o . 
m' = Ay (1) 
0’ = .6m 


The first equation in (1) says that each midlife rabbit gives birth 
to 4 young each period and that each old rabbit gives birth to 1 young 
each period (of course, only females have babies, but in this initial 
model we are not differentiating between sexes). The second equation 
says that 40% of all young rabbits survive through their first 2 years 
(one period). The third equation says that 60% of midlife rabbits live 
through a period to become old rabbits. Finally, assume that all old 
rabbits die within 2 years. If L is the matrix of coefficients in (1), 


Go 4 
L=f4 6 8 (2) 
y 6 8 
then (1) has the matrix form 


= La (3) 


This population model is called a Leslie model. If there were 
more age groups, the matrix L of coefficients would have the form 


0 b, b, b, ae b,, 
p, 0 0 0 0 
0 P2 0 0 > Pe ‘ 0 
a= Oo = py VU 0 (4) 
a” DD oO» By 0 
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where b; is the number of offspring per individual in group i and p, is 
the probability that an individual in group i survives one period to 
become a member of group + 1. 

The model is somewhat like a Markov chain, except that the 
numbers b, are not probabilities. Rather than summing to 1, the three 
variables y, m, and o will grow larger or smaller over time. We want 
to know the behavior of this model over many periods. Will the total 
number of rabbits increase or decrease? Will there be any cyclic pat- 
terns in the population, such as a surge in the young one year followed 
a period later by a surge in midlifes, then the next period a surge in 
the young, continuing back and forth? Or after several periods, will 
there be a steady distribution of the population; for example, will the 
fractions of rabbits that are young, are midlife, and are old remain the 
same from period to period? 

The answers to these questions depend on the eigenvalues and 
eigenvectors of L. As we saw in Sections 2.5 and 3.4, the long-term 
population distribution L*a, for large k, will be a multiple of the 
dominant eigenvector of L (the eigenvector associated with the largest 
eigenvalue), and the long-term growth rate will be the dominant (larg- 
est) eigenvalue. Whether the model converges quickly to the dominant 
eigenvalue depends on how much the largest eigenvalue dominates the 
second largest eigenvalue. 


Table 4.3 


Period Young Midlife Old Total 


0 100 50 30 180 
l 230 40 30 300 
2 190 92 24 306 
3 392 76 55 523 
4 359 157 47 563 
5 673 I44 94 916 
6 669 269 86 1024 
7 1162 266 161 1589 
8 1232 465 160 1857 
9 2021 493 279 2793 
10 2250 808 295 3353 
1] 3529 900 485 4914 
14 7382 2474 980 11836 
19 33873 9553 4602 48028 


20 42815 13549 $732 62096 
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To start, let us examine the behavior of our model with a com- 
puter simulation. Starting with an initial population of 100 young, 50 
midlife, and 30 old, we compute a table of populations in successive 
periods using (1); we have rounded values to whole numbers in Table 
4.3. Initially, we see a very pronounced cycling behavior between 
young and midlife rabbits, and this in turn leads to an uneven growth 
in the total population—there is little growth between periods | and 2, 
between 3 and 4, or between 5 and 6. The cycling is much smaller 
after 20 periods but still present. If we run the model a little longer. 
we see that the population stabilizes with a distribution and growth 
multiplier 


Long-term distribution: 70% young, 21% midlifes, and 9% old 
Growth multiplier: 1.334 (33.4% growth rate) (5) 


That is, the population vector in the next period is about 1.334 times 
this period’s population vector. a 


Using more advanced techniques introduced in the Appendix to Section 
5.5 (or using the appropriate mathematical software), we find that the 
eigenvalues of L in decreasing absolute size are 
A, = 1.334, A, = —1.118, A; = —.152 
and the dominant eigenvector (associated with A,) is 
u, = [.697, .209, .094] 
The fact that A, is close to A, in absolute size is why the simulation took a 


long time to stabilize at uy). 
In mathematics, the best way to understand a property of interest is 


often to study cases where the property fails to be true. We shall now take 


this approach and look at a Leslie model whose group percentages do not 
converge to the dominant eigenvector. 


Example 2. A Cyclic Leslie Model 
Consider the following Leslie model: 


y = 4o 


= 

| 
Nn 

<< 


(6) 


with the Leslie matrix 


0 4 
G 0 (7) 
> 0 
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Let the initial population be 100 young and no midlife or old rabbits. 
Then the next period’s population is easily seen to have just 50 mid- 
lifes, the next following period just 25 old rabbits, and the third fol- 
lowing period just 100 young—returning to the initial population. This 
cycle will repeat over and over again, and there will never be a stable 
distribution of age groups—we have pure cycling. No matter what the 
starting population is, it will repeat every three periods. Such cycling 
is very unusual and results from properties of the eigenvalues of this 
problem. 

Let us compute the eigenvalues of L in (7). Recall from Section 
3.1 that the eigenvalues \ are the zeros of the determinant of the matrix 
(L — AD. In this example, det(L — AID) has a simple form that is easy 
to work with. 


an ee 
B= M= lS =k 0 (8) 
3° =k 
SO 
det(L — AD. = (—A)(—A)(—A) + (4)(.5)C.5) (9) 
= —\? + 1 


Although the determinant of a 3-by-3 matrix involves taking six di- 
agonal products (see Section 3.1), the three 0’s in (8) eliminate all but 
two of these products. 

Setting the determinant —A* + 1 equal to 0, we get 


"= | (10) 


One obvious root of (10) is A = 1. But all cubic equations must have 
three roots. The other two roots, called roots of unity, involve complex 
numbers. They are —% + (V3)i, where i = V —1. These complex 
numbers have absolute value | also. So there are three dominant 
eigenvalues of size 1, instead of a single one as is usually the case. 
This is why there is not a single dominant long-term effect as occurred 
in Example |. r 


Perpetual cycling occurs if there are several largest eigenvalues (in 
absolute size). If the largest eigenvalue is complex, we must get cyclic 
behavior—since complex zeros of a polynomial always come in conjugate 
pairs (c + id and c — id) of the same absolute value. The cyclic Markov 
chain in Example 2 of Section 4.4 had two largest eigenvalues. Recall that 


its transition matrix was 
a | 
A = 1] 
; 3 Oh 
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for which det(A — AI) = \2 — 1. So its eigenvalues were 1 and —1. 
The following theorem, whose proof is beyond the scope of this book, 
gives a condition for assuring noncyclic behavior in a Leslie model. 


Theorem. A Leslie growth model with matrix L given in (4) will have a 
unique largest eigenvalue that is a real number, and hence a stable 
long-term population distribution, if some consecutive pair b,, b;,., of 
entries in the first row of L are both positive, that is, if two consecutive 
age groups give birth to offspring. 


The condition in this theorem is satisfied in Example | but not in 
Example 2. 


Perse een eae | 
Example 3. Harvesting a Renewable Resource 


This time we shall grow rabbits for profit, to sell some of the rabbit 
population every year. The goal will be to determine the proper pop- 
ulation size and distribution among age groups so that we can ‘‘har- 
vest’’ a given number of rabbits each year without depleting the pop- 
ulation. That is, we want a minimal-size collection of rabbits that will 
sustain a given harvest forever. This time we shall use a model that 
differentiates between females and males. We shall again use three 
different age categories, but now the age spans in the groups will vary. 
The groups are 


bm = baby male rabbits (less than | year old) 
bf = baby female rabbits 

ym = yearling male rabbits (between | and 2 years old) (12) 
yf = yearling female rabbits 

am = adult male rabbits (2 or more years old) 


af = adult female rabbits 


The time period is | year. In this model adults do not all die at the 
end of one time period but rather survive to the next year with prob- 
ability .75. We shall only harvest adults, hm male rabbits harvested, 
and hf female rabbits harvested. Let us try using the following equa- 
tions. 


bm’ = 2af 

bf’ = 2af 

ym’ = .6bm 

yf’ = .6bf 

am = .6ym + .75am — hm 

af’ = .6yf + .7Saf — hf 


(13) 


If A is the matrix of coefficients on the right-hand side of (13), ex- 
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cluding terms —hm and —hf, x and x’ are the 6-entry vectors of 
current and next-period age-group sizes, and h is the harvesting vector 
(0, 0, 0, 0, hm, hf), we have 


x’ = Ax -—h (14) 

For stable harvesting, we seek values for the variables in (12) 

such that specified amounts can be harvested from each group in a 

year [in (13) only am and af are harvested] and in the next year all the 

groups will be the same sizes, ready to be harvested again. Mathe- 
matically, this means that in (14), x’ = x. So (14) becomes 


x = Ax -—h 


With matrix algebra, we have 


(A — Dx =h (15) 
or 
—bm 2af = 0 
—bf 2af = 0 
.6b = = 0 
= sar (16) 
.6bf yi = 0 
.6ym — .25am = hm 
.6yf — .25af = hf 


Let us also compare, in a very general way, the difference be- 
tween solving this model and the Leslie growth model in Example 1. 
The Leslie model is a dynamic growing model whose solution involves 
an eigenvalue problem, while here we have a static model (one period 
is like the next period) whose solution involves solving a standard 
system of nm equations in n unknowns. A Leslie model like (4) in 
Example 1 will converge to a constant distribution of ages in all but 
the most exceptional cases, whereas system (15) can easily have no 
solution of constant population with harvesting. For example, if with- 
out harvesting the population naturally decreases (i.e., ||A|| < 1), then 
with harvesting it will decrease even faster and eventually become 
extinct. 

We cannot find a solution to (15) by iterating (13) and hoping 
for the variables to converge after many periods to the stable harvest 
distribution. The reverse happens: If you do not start with the right 
population, the populations over successive periods will move farther 
away from the desired answer. Such divergence means that one has to 
be very careful about roundoff errors in solving (16). 

Let us choose the values hm = hf = 100, and solve (16) by 
Gaussian elimination. We obtain (with answers rounded to whole num- 
bers) 
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bm = 426, bf = 426, ym = 255, (17) 
yf = 255, am = 213, af = 213 
Observe that while only females produce offspring, it appears that this 
model really groups males and females together, in that there are the 
same numbers of males and females in each age group. So far we have 
seen only that this equality occurs when hm = hf = 100. To see that 
it is true for hm = hf = &, for any k > 0, write (16) in matrix form 
as 


(A — Dx = kh*, where h* = [0, 0,0, 0, 100, 100] (18) 


In (17), we have the solution x* = [426, 426, 255, 255, 213, 213] 
to (18) when k = 1: that is, (A — I)x* = h*. Then by linearity, 


A(kx*) = k(Ax*) = kh* (19) 


So when hm = hf = 100k, the stable population in our harvesting 
model will be kx* = [426k, 426k, 255k, 255k, 213k, 213k]. 

We see from (17) that the sex differentiation can be dropped from 
the original model in (13) when hm = hf, yielding the simpler model 


b' = 2a 
y’ = .6b (20) 
iim 6+ .715a +h 


where b = baby rabbits, y = yearlings, a = adults, h = harvest. It 
was not at all obvious in advance that these two models would be 
equivalent. 

Suppose that hm # hf. If we harvest twice as many of one sex 
as of the other, we obtain the results shown in Table 4.4 by re-solving 
(16) for these new values of hm and hf. 


Table 4.4 
Harvest Stable Population 
hm hf x; x, x3 X4 Xs X¢ 
50 100 426 426 255 255 413 213 
100 50 212 212 127 127 =—93 106 


Surprise! Our model gives us a negative answer when we try to harvest 
twice as many adult males as females. Why did this happen? What is 
the smallest ratio of harvested males to females that can occur? 

To get a fuller understanding of our harvesting model, let us 
compute the inverse of the coefficient matrix in (16). 
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= bse DP <2SS 16 etes 
0 Pa) i en SC ee 
ee Ps io, M1 -1 153° 0 255] 1 
0 ao or Se artes 


—1.44 2.21 -24 3.68 -4 6.13 
0 77 0) 1.28 0 2.13 


Notice how the sparsity of the original coefficient matrix is lost in the 
inverse. This inverse gives an explicit formula for x in terms of h that 
can be invaluable. For example, if only adult males and females are 
harvested, then h = [0, 0, 0, 0, hm, hf] and multiplying (A — I~! 
times this h yields a formula for the stable population vector x. 


x=(A -— Dh 
= [4.25hf, 4.25hf, 2.55hf, 2.55hf, —4hm + 6.13hf, 2.13hf] 
(22) 
The form of the solution vector in (22) answers all questions about the 


behavior of this particular model. In particular, for the fifth component 
to be = 0, hf should be at least § of hm. = 


We now consider a simplified model for rabbit growth that results in 
a single equation. However, this equation will involve the population in the 
current period and the previous period. 


RSD TOS TY 
Example 4. Recurrence Model for Rabbit Growth 


First we consider an extremely simplified model, in which the popu- 
lation doubles in each successive period. If r,, is the rabbit population 
in the nth period, we have 


. 2rn—1 (23) 


If we started with rp = A rabbits in period 0, then in period | 
we would have r, = 2A rabbits, and in the next period r, = 4A 
rabbits. It is not hard to see that the formula for r,, is 


r,_=2"A 
For the general problem, 


lr, = Clp—} has solution r, = C"N (24) 

Now let us consider a model with adults and young rabbits. We 
suppose that once a pair of rabbits are | year old, they have one pair 
of offspring every year for the rest of their lives. Assume that all pairs 
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consist of one female and one male. We make a two-group mathe- 
matical model of the rabbit population. 

Let m,, denote the number of mature pairs (at least 1 year old) 
during period n, y,, denote the number of young pairs (under 1 year 
old) during period n, and r,, denote the total number of rabbit pairs 
during period n. If we assume that no rabbits die (the model will not 
be valid for long periods), then from one period of time to the next, 
these quantities obey the equations 


ve mM, —| (25) 
m, = M,-4 3 Yn-1 

and 
Fn = mM, + Yn (26) 


Observe that r, can also be expressed as r,,_, plus the number of new 
pairs born in the nth year (i.e., y,,). 


Py = Fn-t ae (27) 
Comparing the right-hand sides of (26) and (27), we conclude that 
m, = Ta (28) 


In words, we explain (28) by the fact that any rabbit, young or mature, 
alive one period ago will be an adult this period. And restating (28) 
for the previous year, we have m,,_, = 1r,,-2. When this identity is 
combined with y, = m,,_, [equation (25)], we have 


Ja = Mya = rn—2 (29) 


Substituting (29) in the equation for y,, in (27), we obtain the following 
simple relation for r,, 


fp SF ot Ine (30) 


Equations such as (30) that tell how to compute the next number in a 
sequence fro, 1, >, .. . are called recurrence relations. Equation 
(30) is called a second-order relation because the right-hand side goes 
back 2 years. Recurrence relations are the discrete counterpart to dif- 
ferential equations, that is, when time is measured in discrete units 
rather than continuously. 

Equation (30) is called the Fibonacci relation (named after the 
thirteenth-century Italian mathematician Fibonacci, who first studied 
this growth model). For example, if we started with one young pair 
of rabbits (i.e., 7» = 1), after one period we would have one adult 
pair (r,; = 1) and thereafter we could use (30) to get the following 
sequence of population sizes: 
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r, = 2, rr = 3, - m=5, rs = 8, 
re = 13, r, = 21, rg = 34 


and so on. The numbers in this sequence are called the Fibonacci 
numbers. Fibonacci numbers arise in many settings in nature and 
mathematics. Perhaps the most famous example is the spiral pattern 
of leaves around a blossom: On apple or oak stems, there are 
5 (= r,) leaves for every two (= ry) spiral turns; on pear stems, 
8 (= rs) leaves for every three (= r;3) spiral turns; on willow stems, 
13 (= rg) leaves for every five (= r,) spiral turns. There are biological 
reasons involving the Fibonacci relation for these numbers. 

The Leslie model in Example | was a system of recurrence re- 
lations [as was our original model (25)]. That is, (1) could have been 


written 
Yn = 4m,,—| + On —} 
m, = .Ay,_| (31) 
0, = 6m, _, 


Harvesting equations (13) in Example 3 are recurrence relations (but 
there we are not interested in growth of the model over many periods; 
instead, we seek a starting distribution which will remain constant with 


annual harvesting). The transition equations of a Markov chain are also 
recurrence relations. 


[Note that the matrix generalization of the solution for a first- 
order recurrence relation, given in (24), tells us that the solution of 
the system a, = La,_, in (31) is 

a, = L"a 


Unfortunately, this is not new information. |] 


Let us return to the second-order recurrence relation for r, in 
(30). The theory of recurrence relations says that any linear recurrence 
relation for r,, of the form (where k and the c; are constants) 


Pn = Cin) + C2F 2 dial set 3 CK nn —k (32) 
has solutions of the form 
r, = ba" (33) 
where b and a are values to be determined. Note the similarity with 
the form of solution for linear differential equations given in Section 
4.3. We can determine a by substituting (33) into equation (32). For 
the relation r, = r,—, + r,—2, (33) yields 


ba” = ba"! + ba"~? 
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We can simplify this equation by dividing both sides by ba"~? to 
obtain 


ar =ad+ 1 or a*7-a-1=0 (34) 


Paralleling the terminology for differential equations, equation (34) is 
called the characteristic equation of the recurrence relation. The left- 
side polynomial in (34) is called its characteristic polynomial. 

The two roots of the characteristic equation in (34) are found by 
the quadratic formula 


—b + Vb* — 4ac 
ax? + bx +c=0 has roots ee a (35) 


In our case, the solutions are (1 + V5)/2, or approximately 1.681 
and — .618. It is quite surprising that the simple sequence formed by 
the Fibonacci numbers should turn out to be a function of an irrational 
number such as V/5. 

As with differential equations, the general solution to (24) is a 
linear combination of the two solutions we have found: 


1+ V5\" 1 — V5\" 
hr. = 0, : th (36) 
For simplicity, let a, = (1 + V5)/2 (=1.618) and a, = 
(I = V5)/2 (=~ —.618). So (36) becomes 


r, = bat + b2a5 (37) 


As with differential equations, we solve for b, and b, by inserting the 
Starting conditions in (37). Suppose that rp = r, = | 


l=rp = ba? + bad = Bb, + BS (38) 
ba, + b5a5 — a,b, + ab, 


l=r, 


Solving these two equations in two unknowns, we obtain 


I Oe Se 
Q, — Q, 2V5 /5 (39) 


Remember that a, ~ —.618, so |«%| will always be < 3. Thus the 
formula for the Fibonacci numbers is 


: phe V/sr" 
r,, = Closest integer to V5 ea ly (40) 
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Unfortunately, this formula is much harder to use than the orig- 
inal recurrence relation (30). That is, iteratively computing successive 
Fibonacci numbers by summing the previous two numbers is much 
easier than using (40). The one important fact we do get from (40) is 
that r,, grows at an exponential rate (faster than any polynomial in 7). 

= 


There is one important similarity to note between Example | and Ex- 
ample 4. The largest root of the characteristic polynomial equation (35) is 
the growth rate of this model, just as the largest eigenvalue of the Leslie 
model is that model’s growth rate. To illustrate this link further, let us treat 

‘our original pair of recurrence relations for young and adults as a Leslie 
model: 


Yn = mM, | (41) 
mM, = Yn-1 ¥ My, —| 


The matrix of coefficients in (41) is 


ay 
ufo] ’ 


The eigenvalues of this Leslie matrix L are the roots of det(L — Al). 


= l 


L - A) = 
~~ hd ak 


=)? —-A - 1 (43) 


But (43) is just the characteristic polynomial (35) for our second-order re- 
currence relation. So the roots of (43) are again (1 + V5)/ va 

Iterating the growth model x’ = Lx is equivalent to iterating the 
recurrence relation. Using the eigenvector coordinate approach from Section 
2.5, let us write x as a linear combination of the eigenvectors u,, u, of L: 


X = au, + a,U, 
Then 


x”) = A"x = a,A"u, + a,A"u, 
) 
xi" nl Ais n| 421 
or fa aja; + a,n5 (44) 
x4 U2 U5 


= a,a7u, + a,a5u, 
The sum x{” + x” of the components of x is r,,, the total number 
of rabbits. From (44), this sum is 


ry = XY? + XQ? = ayai(uy, + uy) + aya3(uy, + U2) (45) 
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If we define constants b,, b, as follows: 
b, = aj(uy, + uy) and = by = ax(un, + Uy) 
then (45) becomes 
r, = bat + b a5 (46) 


Formula (46) is the same answer we got in (37). Then the b, and b, 
in (46) and in (37) must be the same constants. Thus the equations (38) for 
determining initial-condition constants b,, b, are closely related to the equa- 
tions for determining the weights a,, a, in writing the initial vector x as a | 
linear combination of the eigenvectors, x = a,U, + dou. 

We close by noting that any recurrence relation can be recast as a 
system of first-order equations, the way the Fibonacci relation was in (41). 
For example, 


t= ASn1 F Ofeco F OS,a4 t+ GInaa (47) 
becomes 


r, = al,-, T QS,_, + Azt,-, TF Ag 


n—| 
§ 


l, Sn—1 


k, = fn —1 (48) 


Va-l 


= 


or r, = Ar 


n-] 


One can check that for the A in (48), the characteristic polynomial p(A, \) 
is \* — a,A> — a,d* — asd — ay. 

In summary, the study of linear recurrence relations, when viewed as 
(48), is a special case of the study of linear growth models. 


Section 4.5 Exercises 


Summary of Exercises 

Exercises | and 2 concern Leslie population growth models. Exercise 3 looks 
at eigenvalues of cyclic Markov chains. Exercises 4—10 deal with the har- 
vesting model and variations. Exercises 11—21 involve recurrence relations, 
with Exercises 11—13 about building recurrence relations. 


1. Find the long-term annual growth rate for the following Leslie growth 
models (use iteration; this growth rate is the size of the largest eigen- 
value). Also find the long-term population distribution (percentages in 
each age group). 

(a) y = m+ 20 (b) y’ 


’ ! 


m=y m = .Sy 
o' m o' Sm 


m + 20 


| 
| 
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(c) y' = m + 40 (d) y' = 4a + 4o 
m' = .Sy m’ = .5Sy 
0’ = 5m a’ = Sm 
o' = 5a 
fe) y = 4o 
m' = ,Sy 
a’ = .5m 
eo = 5a 


The characteristic equation, det(L — AI) = O, for this matrix is 
A> — bd,d, = 0, or 


3 = bd,d, 


Briefly describe the behavior of the system p’ = Lp over time for the 
cases 


(a) bdid><1  (b) bdjd, = 1 ~~ (€) bd,d, > 1 


3. What are the eigenvalues for the Markov chain in Exercise | of Section 
4.4? Explain that Markov chain’s behavior in terms of the eigenvalues. 


4. For the harvesting model in Example 3: 
(a) If hf = 50, how large can hm be (without negative herd values, 
as happened with hf = 50, hm = 100)? 
(b) Suppose that we harvest 100 yearling males and 100 yearling fe- 
males. What is the stable herd vector now? 
(c) Suppose that we harvest 100 yearling males and 50 yearling fe- 
males. What is the stable herd vector? Does it make sense? 


5. Solve the rabbit harvesting system of equations (16) yourself by 
Gaussian elimination with h,, = h, = 100 


6. In the harvesting model in Example 3, explain in words why the number 
of females and males is the same when we harvest the same number of 
adult males and adult females? Will this also be true if in addition we 
harvest equal numbers of yearling males and yearling females? 


7. In the harvesting model in Example 3, explain in words why when 
hm = 100 and hf = 50 we get an impossible solution (involving a 
negative number of adult males). 
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11. 


12. 


13. 


14. 


15. 


16. 
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Re-solve (16), using Gaussian elimination, with the survival probabil- 
ities of .6 changed to .5; again let hm = hf = 100. Explain in words 
the difference between your answer and the answer in (17) obtained 
with survival probabilities of .6. 


(a) Suppose that we want the rabbit population to grow by 10% during 
the year. Rewrite the matrix equation (15) to reflect that fact that 
x’ = |.1x. Solve the new version of (16) with hm = hf = 100. 
(b) Find the inverse of the new coefficient matrix (A — 1.1D in (16). 


Note: Requires a computer program for inverses. 


Try to solve our harvesting model by iteration with hm = hf = 100. 
That is, guess an initial value for the population vector x and insert that 
vector of values on the right side of (13). Use the resulting left-side 
values as your next estimated x and continue iterating. Try several 
different starting vectors. Do you ever get convergence to a solution? 


Suppose that a,,, the level of radioactivity after n years from the element 
linearium, decreases by 20% a year. Write a recurrence relation that 
expresses this decay rate. 


Let a, be the number of dollars in a savings account after n years. 
Suppose that money earns 10% interest a year. 

(a) Write a recurrence relation to represent this interest rate. 

(b) If ag = 100, calculate as. 


Let a, = the number of different ways for an elf to climb a sequence 
of n stairs with steps of size 1 or 2. Explain why a, = a,_, + a,_>. 


What are a, and a,? Determine ag. 


Solve the following recurrence relations, given the initial values. 


(a) a, = 3a,-,; — 2a,-5, ag = a, = 2 
(b) a, = 6a,_, — 84,5, @ = 0, a, = | 
(c) a, = 3a,_, + 4a,_-2, 4 =a, = 1 
(d) a, = fs — Gand, 86 = 44> I 


Hint: See Exercise 21. 


Give an approximate formula for a5, for each recurrence relation in 
Exercise 14 (use just the largest root of the characteristic equation). 


Convert each recurrence relation in Exercise 14 into a pair of first-order 
recurrence relations x, = Ax, , for x,, = [a,, 5,]. 

Hint: Let b, = a,,_,; see (48). 

Recast the initial values from Exercise 14 into a initial-value vector x,. 
Check that the eigenvalues of each A equal the roots of the characteristic 
equation for the corresponding original recurrence relation in Exer- 
cise 14. 
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17. 


18. 


19, 


20. 


21. 


Solve the recurrence relations you got in Exercise 16 using the method 
at the end of Section 3.1 to get a formula for a,. 


Convert the following recurrence relations into a system of first-order 
recurrence relations. 
(a) a, = 24,1 . f,~2.- Gy—3 (b) Gd, — G,-2 + Gy —4 


Show that a, = a”, where a* = c,a@ + C5, will always be a solution 
to the recurrence relation a, = c,a,_, + Cd, _>. 


Show that any linear combination of solutions to the recurrence relation 
a, = C\a,—, + C.a,—> 1S again a solution. 


Verify that a, = nd” is a solution to the recurrence relation a, = 
Ca, 1 + Ca,_> whose characteristic equation k7 — c,k — c, = 0 
has A as a double root. 

Note: If k> — c,k — cy = O has X as a double root, the characteristic 
equation can be factored as (k — \)* = O. This means that c, = 2d 
and c, = —A*. Use these values for c, and c>. 


4:0) Linear Programming 


Studies have shown that about 25% of all scientific computing is devoted to 
solving linear progams. Linear programming is the principal tool of man- 
agement science. The object of a linear program is to optimize—maximize 
Or minimize—some linear function subject to a system of linear constraints. 
There are hundreds of different real-world problems that can be posed as 
linear programs. We start with a simple linear problem presentedjto maxi- 
mize sales from furniture production. 


Pee RS IS eR eh lO Rh Pe) Lat aA wate he oR ea Tes Tan Ta hh ea ica ee 
or LA NSN Tata WN, =P al a oP Pat Ps BN ety My oe why maha a SoMa hms Mg Moe ty peg ay ear Na ha 
Ba a ot Bett ean Mant A gal a A Bly Sem Mea gO a me Be May Be a my ga eg ae dp Be, eh 
Pes SI es A nn Be ee ae Ca eg Ht ae a WAG AL A Se aetna “ 
Tne ye tg eT ie ge mL la aA as Sp yang bea, ae AA Io A a Le " 
SERRE ect ec gions ak A aN iS prio aa A he NE ea ; 


A factory can manufacture chairs and tables. Let x, be the number of 
chairs produced and x, the number of tables. Chairs sell for $40 a piece 
and tables for $200 a piece. The production of x, chairs and x, tables 
requires various amounts of raw materials whose supplies are limited. 
The following inequalities describe the requirements and supplies. 


I 


1400 
2000 (1) 
3600 
1800 


Wood: xX, + 4x, 
Labor: 6 le eS 
Braces: Ruut shake 


IA IA 


A 


Upholstery: 2X; 


Subject to these constraints we want to pick x, and x, so as to maximize 
the objective function of the total sales. Our model is 
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Figure 4.12 Feasible region of linear program. 
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x4 


2x, = 1800 


2 
28. 


Li Ls 9 = 3600 


x) Y fg Ne 


ZS _ 


1 


Maximize 40x, + 200x, (2) 
subject to x, = 0, x, = O and (1) 


Linear programming models of energy usage in the American 
economy have used thousands of variables and constraint equations 
and inequalities. Linear models with close to 1,000,000 variables and 
over 100,000 constraints have been developed and solved in private 
industry. Such a system of equations has 10,000,000,000 (10 billion) 
coefficient terms. Of course, in these large problems almost all coef- 
ficients are zero. A large mathematical theory about linear programs 
has been developed with simplifying shortcuts that make it possible to 
solve huge linear programs. 

In Figure 4.12, we have marked (the shaded area) the feasible 
region of x, — x, points that satisfy (1). 

The key to solving a linear program is the following theorem. 


Theorem. A \inear objective function assumes its maximum and min- 
imum values on the boundary of the feasible region (assuming that the 
feasible region is bounded). In fact, the optimal value is achieved at 
a corner point of this boundary. 


Proof. Although true for all linear programs, we verify this proposition 
in the case of two variables (as in Example 1). Consider any line 
crossing through the feasible region. If we compute the values of a 
linear function along this line, we observe that as we move in one 
direction on the line, the linear function constantly increases and in 
the other direction constantly decreases. (There is one exception—the 
linear function could be constant along the line.) To find the maximum 
value of this linear function along the line, we should go as far as 
possible along the line in the direction of increasing values, that is, go 
to an end of the line segment where it meets the boundary. Thus the 
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maximum of a linear objective function occurs at the boundary of the 
feasible region. 

Next repeat this argument using a boundary line of the feasible 
region. Again it pays to go to an end of the boundary line, that is, to 
a corner of the feasible region. us 


The theorem tells us to look at the corners of the feasible region 
for the optimal (x,, x,)-value. In theory, any pair of constraint lines 
could intersect to form a corner of the feasible region, but by plotting 
the constraint lines, as in Figure 4.12, we can see which pairs of lines 
intersect along the boundary of the feasible region. To find the inter- 
section point of two constraint lines, we solve for an (x,, x5) point that 
lies on both lines—the same old problem of two equations in two 
unknowns. Table 4.5 lists the coordinates of the corners and the as- 
sociated objective function values. 


Table 4.5 

Corner Intersecting Objective 
Coordinates Constraints Function 
(QO, Q) Xj => 0 and X> => 0 0 

(O, 300) x, = 0 and braces 60,000 
(300, 275) Braces and wood 67 ,000*** 
(760, 160) Wood and labor 62.400 
(900, 66.6) Labor and upholstery 39,333 
(900, 0) Upholstery and x, = 0 36,000 


So the optimal production schedule is to make 300 chairs and 275 
tables, whose sales value will be $67,000. a 


Before giving a general procedure for solving linear programs, we 
present some examples that show how to build linear programming models. 
The reader may also want to refer back to the crop ne linear program 
presented in Section 1.4. 


Example 2. Dietician’s Problem 


Suppose that a meal must contain at least 500 units of vitamin A, 1000 
units of vitamin C, 100 units of iron, and 50 grams of protein. A 
dietician has two foods for the meal, meat and fruit. Meat costs 50 
cents a unit and fruit costs 40 cents a unit. 

Each unit of meat has 20 units of vitamin A, 30 units of vitamin 
C, 10 units of iron, and 15 grams of protein. Each unit of fruit has 50 
units of vitamin A, 100 units of vitamin C, | unit of iron, and 2 units 
of protein. 
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The dietician wants to have the cheapest meal that satisfies the 
four nutritional constraints. Let us formulate the dietician’s problem 
as a linear program. Make x, be the number of units of meat used and 
x, the number of units of fruit used. Then the objective is to minimize 
the cost of the meal 

Minimize xX, + .4x5 

subject to the nutritional lower-bound constraints 
Vitamin A: 20x, + SOx =» 500 

Vitamin C: 30x, + 100x, = 1000 


Iron: is, Fie = Oe 
Protein: LOX a acetone 
Kp=aid, z210 & 


We should note that the constraints in linear programs do not have to 


be inequalities. Later in this section we shall see how to convert inequalities 
to equations and equations to inequalities. The next example is an important 
type of linear programming problem in which the constraints are equations. 


Example 3. A Transportation Problem 


Warehouses |, 2, and 3 have 20, 30, and 15 tons, respectively, of 
chicken wings. Colleges | and 2 need 25 and 40 tons, respectively, 
of chicken wings (to serve to students). The following table indicates 
the cost of shipping a ton from a given warehouse to a given college. 


College 

A B 

| SO 45 

Warehouses oy) Ss”: ge. 
3 40 65 


Since the overall demand at both colleges is 25 + 40 = 65 and 
the overall supply of all three warehouses is also 20 + 30 + I5 = 
65, all the supplies of each warehouse must be used. The constraints 
are that the total amount of chicken wings shipped from warehouse | 
must equal 20 tons; from warehouse 2, 30 tons; from warehouse 3, 15 
tons; and the total amount shipped to college A must equal 25 tons 
and to college B 40 tons. If x, is the number of tons shipped from 
warehouse i to college j, then these constraints are 


Warehouse Kou Gh. 249 = 20 
equations Pe oe Bi) 
X3; + Xz = 15 (9) 
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College Xi + Xp; + X; = 25 
equations: Eis + X55 + x, = 40 


The objective is to minimize the transportation costs. In terms of the 
x,, the transportation costs are 


80x,, + 45x. + 60x, + 55x + 40x53, + 65x3. ~—-(4) 


Thus our linear program is to minimize (4) subject to (3) and x, = 0. 
i 


a ae 
Se ee 
ee a 

Se a a ae 


Example 4. | Running a Chicken Farm 


We have a farm with 5000 chickens. Each year for 3 years we must 
decide how many of the chickens should lay eggs to be sold and how 
many chickens should be hatching eggs to produce more chickens next 
year (we assume that all chickens are hens; roosters are ignored). At 
the end of 3 years, all the chickens are sold for slaughter at $2 per 
bird (this includes chickens hatched during the third year). A chicken 
can hatch 30 eggs in a year. The eggs from one chicken in | year earn 
$7. It costs $2 a year in feed for chickens (no charge for chickens born 
during the year). The objective is to maximize income from eggs and 
from the final sale of the chickens. State this maximization problem 
as a linear program. 

Let x; and y; be the number of chickens hatching and laying eggs 
for sale, respectively, in the 7th year, i = 1, 2, 3. Then the following 
equations represent the fact that x, + y,; equals the total number of 
chickens each year (the original number 5000 plus the numbers of new 
chickens born thus far). 


x, + y, = 5000 
x, + y, = 5000 + 30x, (5) 
x, + y; = 5000 + 30x, + 30x, 


Let us rewrite (5) with all the variables on the left side and each 
variable in a different column. 


x, + y, = 5000 
=I. +  X> + y> = 5000 (6) 
— SUX, —> 30x; X3 + V3 = 5000 


As usual, all variables must be nonnegative. 


(7) 


The objective function to be maximized is 
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Maximize 2(5000 + 30x, + 30x, + 30x;) 
+ T(y; + yz ¥ ys) (8) 
— 2(15,000 + 60x, + 30x,) 


The first factor in (8) represents the sale of all chickens after 3 years, 
the second factor the income from eggs, and the last factor the feeding 
cost [the total number of chicken-years of feeding is the sum of the 
right-hand sides in (7)]. 

The required linear program is to maximize (8) subject to (6) 
and (7). a 


We shall now develop a general procedure for solving linear programs. 
The discussion will be couched in terms of the chair—table linear program 
in Example |. Our presentation will necessarily be sketchy. Readers inter- 
ested in a more extensive treatment of linear programming should turn to 
any of the dozens of books on the subject (most colleges have several courses 
about linear programming, offered by mathematics, economics, and business 
departments). 

Note that the method used in Example | of graphing and then checking 
the corner points of the feasible region for an optimal value is not feasible 
for larger problems. An n-variable problem with m constraints can have up 
to m" corner points to check. 

The theory of linear programming is centered about the following 
method, now called the simplex algorithm, which was developed over 30 
years ago when the advent of digital computers first made it possible to try 
to solve moderate-sized linear programs. Intuitively, the simplex algorithm 
starts at the origin (a corner of the feasible region) and moves along a 
boundary edge of the feasible region to a better corner, where the objective 
function is larger, and continues in this way until it reaches an optimal 
corner. | 

We outline the simplex algorithm and then describe it in detail using 
Example | to illustrate the calculations. 


Simplex Algorithm 

Part 1. Let x, be the variable with the largest positive coefficient in 
the objective function. Starting at the origin, increase x, as much as 
possible until an inequality constraint is reached (while the other 
variables remain equal to 0). 


Part 2. Make a partial change of coordinates by replacing x, with x; 
so that the new corner, which we reached by increasing x,, becomes 
the origin in the new coordinates. If any coefficient in the new 
objective function is positive, go back to part 1; otherwise, the new 
origin is an optimal corner. 
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To find a better corner, the simplex algorithm looks at the objective 
function and checks which coefficient c; is largest (most positive). Let x, be 
the variable with the largest coefficient in the objective function. Then in- 
creasing x,, yields the greatest rate of increase in the objective function; in 
economic terms, activity / is the most profitable. 

The simplex algorithm increases the value of x, as much as possible, 
that is, until increasing x, any more would violate one of the inequality 
constraints. If all c; = 0, so that increasing any x; cannot increase the ob- 
jective function, the current origin must be an optimal corner. 

In the chair—table problem, the simplex algorithm would start at the 
origin (O, 0). The coefficient of x, is greater than the coefficient of x, in the 
objective function (c,; = 200 versus c, = 40), so x, would be increased 
(while x, is kept fixed = Q). In words, we produce as many tables (variable 
X,) as possible because they are more profitable than chairs. 

Recall the objective function and inequalities in this problem, 


Maximize 40x, + 200x, 
B=, x» = 


Wood: x, + 4x, = 1400 
Labor: 2x, + 3x, = 2000 (9) 
Braces: x, + 12x, = 3600 

Upholstery: 2X, = 1800 


Looking at Figure 4.12, we see that x, can be increased to x, = 300, 
where the objective function is 60,000. Any greater value of x, would violate 
the braces constraint. 

Let us show how the new corner can be found algebraically (since in 
larger problems, we will not be able to draw a picture of the feasible region). 
We want to increase x, while keeping x, = Q. Substituting x, = O into the 
inequalities of (9), we obtain 


Wood: 4x, = 1400 

~ Labor: 3x, = 2000 (10) 
Braces: 12x, = 3600 
Upholstery: O = 1800 


We can ignore the upholstery inequality 0 = 1800—it does not contain x, 
and will always be true. The other three inequalities in (10) are easily solved 
in terms of x, to become 


Wood: x, = 350 
Labor: x, = 6663 (11) 
Braces: x, = 300 


The smallest of the bounds on x,, namely 300, is the amount x, can be 
increased without violating any inequality. So again we find that (0, 300) is 
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the new corner point we reach by increasing x, as much as possible. This 
corner (0, 300) is formed by the intersection of constraints x, = 0 and 
x, + 12x, = 3600. 

Next comes part 2 of the simplex algorithm. We perform a linear 
transformation that changes our coordinate system by replacing x, with a 
different coordinate variable, call it x,. The current corner x, = 0, x, = 
300 will be transformed into the origin x, = 0, x5 = 0 in this new coordinate 
system. 

To motivate this change of variables, we first need to show how our 
system of constraint inequalities can be recast into a system of equations. 
For this recasting, we must introduce additional variables. A slack variable 
equals the difference between the left- and right-hand sides of an inequality. 
With slack variables, the linear program in (9) becomes 


Maximize 40x, + 200x, 
subject tox, =0, x%=0, 4,20, 4% 20, x, 20, x, = 0 and 


Wood: Ky. Ads oF ks = 1400 

Labor: 2X, + 3H + x, = 2000 (12) 

Braces: My + 1ZX5. + + Xs = 3600 
Upholstery: 2x; + x, = 1800 


where x3, X4, X5, X%_ are the slack variables. We call x,, x, independent 
variables—they are the original coordinate variables from (9). Like x,, x3, 
a Slack variable must be = 0. 

A slack variable is what the simplex algorithm uses as x5, the variable 
to replace x, in the change of coordinates mentioned above. In particular, 
we want to use xs, the slack variable in the braces constraint x, + 12x, + 
x; = 3600. Remember that the corner (0, 300) is the intersection point of 
x, + 12x, = 3600 with axis line x, = 0. Forcing the slack variable x; to 
be Q is the same as forcing x,, x, to satisfy the equation x, + 12x, = 3600. 
The corner x, = 0, x, = 300 where lines x, = O and x, + 12x, = 3600 
meet is in x,, X;-coordinates the origin, x, = 0, x, = Q. This is exactly 
what we were looking for (see Figure 4.13). 

We must rewrite the system of equations in (12) so that x; replaces x, 
as an independent variable, that is, as a coordinate variable. We do this by 
using the elimination-by-pivoting process to eliminate x, from other equa- 
tions in (12). We subtract the appropriate multiple of the braces equation in 
(12), x, + 12x, + xs; = 3600, from the other equations to eliminate x,. 

The wood equation x, + 4x, + x; = 1400 has an x,-coefficient of 4 
while the braces equation has an x,-coefficient of 12. So we subtract 75, or 
3, times the braces equation from the wood equation 


Wood: xy 4x5 = 1400 
Re SR a a i 
New wood: 3X) + x, — 3x5 = 200 


The labor constraint becomes 
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Figure 4.13 Feasible region with x,, 
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The upholstery equation is unchanged, since x, does not occur in it. 


New upholstery: 2X; 


= 2000 
+ x= = 3600) (13b) 
a 4X5 —= 1100 
+ x, = 1800 (13c) 


We also need to eliminate x, from the objective function. To do this, 
we write the braces equation as x, + 12x, + x; — 3600 = 0. 


Objective functions: 40x, + 200x, 
— »? (Braces: x, + 12x, + xs — 3600) (14) 
New objective function: aX, — x; + 60,000 


We must also rewrite the original braces constraint x, + 12x, + x; = 3600 
to make x, look like a slack variable; that is, x, should have a coefficient of 
1. Dividing by 12, we have 


oe) 


New braces: ax, + X> + 73x, = 300 (13d) 
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Collecting (13a)-(13d) and (14), we have the desired transformed lin- 
ear program. 


Maximize x, — 3x; + 60,000 

subiect toa; = 0,. xo. 0, 25.20; cage Oy ae 0p xe. = 0, and 
5X, + x, — 3X-¢ aa 

4X +X, — 4X-s = 1100 (15) 
12x, + X> {2X5 = 300 

2X} + x, = 1800 


Observe that (15) has the same general form as (12), except that x, 
and x; have interchanged roles as independent and slack variables. Since we 
have only restated the problem, a maximum for this problem is a maximum 
for our original problem (12). The feasible region for our linear program in 
the restated form is the “‘hashed’’ region in Figure 4.13; note that the co- 
ordinate axes are labeled x, and xs. 

This sequence of computations to interchange the independent and 
slack variable roles of x, and x; is called a pivot exchange. Recall from 
Section 3.2 about elimination by pivoting that we used the term pivot on 
entry a, to denote the process of using equation / to eliminate x; from all 
other equations (and making the coefficient of x; be | in equation /). If 
independent variable x, is exchanged with the slack variable in equation g, 
then we are pivoting on the coefficient of x, in equation g. 

Using the concept of a pivot, we restate the simplex algorithm as 
follows. 


Simplex Algorithm 

Part 1. Let x, be the (independent) variable with the largest positive 
coefficient in the current objective function. Increase x, as much as 
possible until an inequality constraint is reached; call this the gth 
inequality. 


Part 2. Perform a pivot exchange between x, and the slack variable in 
the gth constraint. If any coefficient in the new objective function 
is positive, go back to part 1; otherwise, the new origin is an optimal 
corner. 


We can convert the constraint equations in (15) back to inequality 
constraints by dropping the slack variables x,, x3, x4, and X¢: 


Maximize 4x, — °3'x; + 60,000 


5X; — 3X5 = 200 
fx, a 4X5 = 1100 (16) 
12x, + {5X = 300 
2x, = 1800 
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(see Figure 4.13). 

Now we apply the two steps of the simplex algorithm to our problem 
in its new form (16). In step 1 we increase x,, which has the only positive 
coefficient in the objective function, while keeping x; = Q. (As an aside, 
we note that back in Figure 4.12, keeping x; = O means that we move along 
the brace constraint line x, + 12x, = 3600). When we set x, = 0 in (16), 
we obtain the inequalities 


3x, = 200 or x, = 300 
3X, 1100 og. "xa Pee 
ale ate Hi (17) 
TzX, = 300 or x, = 3600 
2x, = 1800 or x, = 900 


The smallest of these constraints is the first. So we can increase x, to 300, 
and our new corner is at x, = 300, x; = O, where the first constraint 
2x,/3 — x</3 = 200 and constraint x, = 0 meet (see Figure 4.13). To see 
where we are in the original problem, we can use the third constraint equation 


in (15) to compute x,’s value when x, = 300 and x, = O—we get x, = 
275; so our new corner corresponds to the point x, = 300, x, = 275 in 
Figure 4.12. 


We return to the equation form (15) of the constraints. The first con- 
straint is 2x,/3 + x; — x;/3 = 200. Then x, is the slack variable for this 
constraint, so in part 2 of the simplex algorithm, we perform a pivot ex- 
change between independent variable x, and slack variable x,. [Note that 
the computations in (17) could be done with the constraints in equations 
form, as in (15): we simply divide the right-side value in each equation by 
the coefficient of x, and pick the smallest positive value. ] 

Using the first constraint to eliminate x, from the other equations in 
(15) and from the objective function, we obtain the new linear program. 


Maximize — 35x, — 5x; + 67,000 
subject tox, =O, i = 1, 2, 3,4, 5, 6, and 


x f- 3X; — 3X; = 300 
— ee + x, + 8X; = 57/5 (18) 

X> — 8X3 + §X- = 275 

SNS + Xs +.X%_ = 1200 


Since all coefficients in the current objective function are negative, the 
current origin is the maximum corner. The value of the objective function 
at the origin x, = x; = OQ is simply the constant term in the objective 
function, 67,000. By setting x, = x; = 0 in (18), we can also directly read 
off the values of x, (= 300) and x, (= 275) as well as the values of the 
other slack variables. 

This finishes our example of how the simplex algorithm works. Any 
linear program of the general form 
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Maximize ¢-x + d subject to AX=b, x =0 (19) 


can be solved by the method we have just presented. 

The key decision in the simplex algorithm is the choice of the pivot 
entry (part 1). We pick the variable x, with the largest coefficient in the 
objective function. Then we divide the right-side value in each constraint 
equation by the coefficient of x, and pick the equation with the smallest 
quotient. The coefficient of x,, in that equation is the pivot entry. 

Just as in Gaussian elimination, we can display the computations in 
the simplex algorithm in matrix notation (omitting the variables). For the 
linear program in (19) we use the augmented matrix 


c =a 
rare 20 
Note that the constant d is written with a minus sign, —d. This is a tech- 
nicality based on the fact that b is the vector of right sides of equations, 
while d is on the left side (if fictitiously d were put in the “‘right side’’ of 
the objective function, the constant would become —d). 
Let us repeat the stages of the simplex algorithm for our chair—table 


program using matrix notation. At the outset we have (where x,, x, are the 
initial independent variables). 


Objective Function | 40 _200 0 0 0 0! 0 


Wood | RS wr Vs al : 1400 21) 
Labor 2 ay HD Bi nD : 2000 
Braces l (12) Ox hoy, .Q | 3600 
Upholstery 2 ry re ae fT S00 


We pick x, (200 is largest coefficient in first row) and then divide each 
positive entry in x,’s column into the corresponding entry in the last column. 
The smallest quotient is in the braces equation, so we pivot on entry (4, 2), 
which is circled in (21). After pivoting on entry (4, 2), we obtain 


Objective Function] 4 0 0 0 —* 0 | —60,000 
Wood OWiw er. =f. O4 2001 9) 
Labor ey ee” --+* | 1,100 | 
Braces a 1 0 0O a 0 | 300 
Upholstery 2 buothe Cr Q 1 | 1,800 


Next we pick x, and then the wood equation. So entry (2, 1) is the second 
pivot. After pivoting on entry (2, 1), we obtain 


Sec. 4.6 Linear Programming 349 


Objective Function| 0 0 -—35 0 —5 0 |! —67.000 


Wood ro “+3 O =F O | 300 

21 5 | (23) 
Labor | ee ae | oj 575 
Braces fon Fe” Fo! 275 
Upholstery CGF PS et | 1,200 


Since the objective function has no positive coefficients, we now have an 
optimum with objective function value 67,000. The values of the current 
independent variables, x3, x;, are zero and from (23) we read off the values 
of the other variables to be 


x, = 300, xX, = 279, xX, = 575, X, = 1200 
The name given to these augmented matrices, (21), (22), and (23), is 


simplex tableaus. 
Let us now go quickly through the solution of another linear program. 


Example 5. 8 Simplex Algorithm Applied to a 
Adar Program 


Consider the following linear program for the production of sugar (x,), 
syrup (x), and molasses (x;): 


Maximize 3x, + 4x, + 2x,. 
subject tox, =O, x,=2O0, x, = 0 and 
Transportation: 2x, + x, + *x,= 6 
Laon, 2, F 2X5 -F Sag = 7 (24) 
Machinery: 3x, + 2x, + 4x, = 15 


In simplex tableau with slack variables added, we have (where x,, x5, 
x3 are the initial independent variables) 


Objective Function] 3 4 2 0 0 0! 0 


PO ep ee es FE es Ep BT ed el 
Transportation elite P APCD | 6 (25) 
Labor Pree ry | 7 
Machinery cee ee ree eres 


The pivot will be in x,’s column. Picking the minimum of {, $, *, 


we take the second, from labor’s row. So the pivot entry is (3, 2). 
After pivoting there, we obtain 
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Objective Function| 1_0 -—4 0 -2 O!-14 
Transportation O -3 1 -3 0 | 31 (26) 
Labor Bad da toy ed age oc 
Machinery hee, UP 1 cogil Fah ory 


The next pivot will be in x,’s column. Picking the minimum of (3)/(3), 
(3)/(2), 7, we take the first, from transportation’s row. So the pivot 
entry is (2, 1). After pivoting there, we obtain 


x xy 4X3 Xy Xs X6 
Objective Function] 0 OO —-—*% —%3 -3} 0 ot 
Transportation 0 -3 3-3 eee 
Labor go" 4 oY See gee Oy. % 
Machinery 0 0 3-3 -3 ee. 
(27) 


Now there are no positive coefficients in the objective function, so we 
have a maximum, where the objective function equals * , the inde- 
pendent variables x3, x4, x; are zero, and the other variables are 


The simplex algorithm was invented in the earliest days of linear pro- 
gramming by G. Dantzig as an intuitive scheme for ‘‘walking’’ along the 
boundary of the feasible regions in search of better and better corners. Many 
more sophisticated methods have been proposed—in the 1970s Khachian’s 
algorithm made the front pages of major newspapers for its theoretical ad- 
vantages over the simplex algorithm, and in 1984 Karmarkar’s algorithm 
gained publicity for being faster than the simplex algorithm for some linear 
programs—but the simplex algorithm is still the best general-purpose way 
to solve a linear program. The explanation of why it works so well requires 
advanced mathematical analysis. 

There are many important variations in the basic theory of the simplex 
algorithm. We mention a few here and give some more in the Exercises and 
in an appendix to this section. 

First, if the problem involves minimization, we can convert it to a 
maximization problem by multiplying the objective function by — 1 (max- 
imizing —¢c~* x Is the same as minimizing C ° x). 

Second, if an inequality constraint is of the form a+ x = Bb, convert it 
to a constraint with a = sign by multiplying both sides by —1 (to get 
=S% 2S =) 0). 

Third, we consider linear programs whose constraints are equations. 
We converted the inequalities in the chair—table program to equations by 
introducing slack variables [see system (12)]. For a system of equations to 
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be in slack variable form, we require that each equation should have a slack 
variable which has a coefficient of + 1 and occurs only in that equation. If 
a system of constraint equations is not in slack variable form, we can use 
pivoting to get the system in this form. We illustrate this process with the 
transportation problem in Example 3. 


ses 
Sent ene 
pater 


xample 6. Putting Transportation Constraint 
Equations in Slack Variable Form 


In Example 3 we obtained the following mathematical formulation of 
the problem of minimizing the cost of shipping chicken wings from 
three warehouses to two colleges. 


Minimize 80x,, + 45x,, + 60x,, + 55x, + 40x3, + 65x35 
subject to x,, = 0 and 


Warehouse X,,; + Xj = 20 
equations sre oe = 30 

Xa; + Xx = 15 (28) 
College Xi + Kae + X53, = 25 
equations Xy2 + Xp» + X35 = 40 


We want to use pivoting to convert the five constraint equations in 
(28) into an equivalent system of equations with slack variables (vari- 
ables that each occur in just one equation). 

Let us make x,, the slack variable for the first equation. To 
eliminate x,, from the fifth equation, we simply subtract the first equa- 
tion from the fifth equation [i.e., we pivot on entry (1, 2), the coef- 
ficient of x,>5 in the first equation]. In a similar fashion we make x,, 
the slack variable for the second equation by pivoting on entry (2, 4), 
and we make x3, the slack variable for the third equation by pivoting 
on entry (3, 6). After these three pivots (which involve subtracting the 
first, second and third equations from the fifth equation), we have 


Xi + Xq0 = 20 
Kay. Peep = 30 

iy Fez = «8§615 (29) 
X41 + X>, + Xs; = 25 
— Ay — A2 ~ 33 soe 


In (29), x15, X59, and x3, have the form of slack variables as required. 
These pivots had the effect of converting the last equation into the 
negative of the fourth equation—the last equation is redundant and can 
be eliminated (the reason for this redundancy is explained in Exercise 
20). Let us make x,, the slack variable for the fourth equation. Pivoting 
on entry (4, 5), that is, subtracting the fourth equation from the third 
equation, we obtain 
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Xi F X42 = & 
+ Xa Rip = 30 (30) 

~ Hy ~ *¥24 + t43 = —10 

X11 + X>, + Xz, = 25 


Now each equation has a slack variable, and we can rewrite (30) as 
the following family of inequalities: 


Xi = “20 
Xx 1 — 53] = -10 (<> x, + xX, = 10) 
Xyy + Xp, = 2 


In the pivoting process, we also have to restate the objective function 
in terms of just x,, and x,, (see Exercise 21 for details). Note that the 
origin in x,;, X», coordinates is not in the feasible region. 

We have reduced this problem in five equations and six unknowns 
to an easy problem of four inequalities in two variables. We 


Sensitivity Analysis 


We conclude this section with a brief discussion of the sensitivity of the 
solution of our linear program to changes in the input values in the con- 
straints. In many economics applications, this sensitivity analysis to changes 
in the input is almost as important as solving the linear program. 


Example i Sensitivity Analysis in 
Chair—Table Production 


The final (slack-variable) form of our linear program when an optimum 
was obtained by the simplex algorithm was 


Maximize — 35x, — 5x, + 67,000 
1, 2, 3, 4, 5, 6, and 


subject to x; = 0,7 


x, + $x, — Xs = 300 
— ty, +x, + 2x 575 

: 3 4 5 (32) 
X> > 8X3 + g4A5 — 2715 
oS 3X; + Xs + X6 — 1200 


Here x; and x, are the current independent variables whose origin 
X3; = xs = O is the optimal corner. These are the slack variables for 
the original wood and braces constraints. Sox, = x; = 0 in the optimal 
solution means that the optimal production schedule will use all the 
wood and all the braces. To determine the values of the original inde- 
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pendent variables and other slack variables, we simply set x, = x; = 
QO in (32) and read off the values of the remaining variables in each 
equation: 


x, = 300, Xx, = 275, x, = 575, X, = 1200 (33) 


As we had found earlier, we make 300 chairs (x,) and 275 tables 
(x5). The labor slack variable (x,) is 575 and the upholstery slack 
variable (x,) is 1200. These slack-variable values are the amounts of 
these two inputs that are unused in the optimal solution. We have an 
excess supply of labor and upholstery. Thus moderate decreases in the 
amount of labor or upholstery available will not affect our solution. 

What about changes in the input materials we do use, wood and 
braces? This is a place where the simplex algorithm really shines. First 
it is convenient to solve for x, and x, in the first and third equations, 
which involve x, and x5, respectively. 


X5 — 275 “f+ 8X3 we: BX. 


Having | less unit of wood (1399 units instead of 1400) is equiv- 
alent to increasing the wood slack variable x, from 0 to 1. To determine 
the effects of 1 less unit of wood, we simply set x, = 1, while x; = 
0, in (34). From (34) we have x, = 300 — 3 and x, = 275 + §. The 
new value of the objective function with | less unit of wood is also 
obtained by setting x, = | in the objective function (while x; = Q). 
We have —35 + 67,000. 

In summary, the coefficients of x, in (34) and in the objective 
function give the effect of 1 less unit of wood: chair production will 
decrease by $, table production x, will increase by %, and profit will 
decrease by $35. If we had | more unit of wood (x, = —1), the 
opposite occurs. Chair production increases by $, table production de- 
creases by %, and profit increases by $35. It is left as an exercise to 
the reader to evaluate the effects of changing the number of braces. 

In economics, the increase in the profit caused by using | more 
unit of an input is called the marginal value of the input. In this case 
it means that we should be willing to pay $35 for 1 additional unit of 
wood because that is the value of wood to us in increasing our sales. 

a 


Section 4.6 Exercises 


Summary of Exercises 

Exercises 1-12 involve converting a ‘“‘word problem’’ into a linear program 
(some of these problems previously appeared in Section 1.4); solutions, if 
requested, are to be obtained by graphing. Exercises 13—17 require one to 
solve linear programs. Exercises 18-21 ask for transportation problems to 
be converted into slack-variable form. Exercises 22—25 involve sensitivity 
analysis. Exercise 26 illustrates duality theory. 
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X43 + Xp = 20 
+ X5y XH = 30 (30) 

— 14 = es + X32 = —10 

Xi + X>, + Xs = 25 


Now each equation has a slack variable, and we can rewrite (30) as 
the following family of inequalities: 


X11 = "20 
Taiz DBO (31) 
—~ X11 ~— X>) == 10 (<> xX + X> = 1Q) 
X11 + Xo; =" 2 


In the pivoting process, we also have to restate the objective function 
in terms of just x,, and x,, (see Exercise 21 for details). Note that the 
Origin in x,,;, X>, coordinates is not in the feasible region. 

We have reduced this problem in five equations and six unknowns 
to an easy problem of four inequalities in two variables. # 


Sensitivity Analysis 


We conclude this section with a brief discussion of the sensitivity of the 
solution of our linear program to changes in the input values in the con- 
straints. In many economics applications, this sensitivity analysis to changes 
in the input is almost as important as solving the linear program. 


Example ‘Sensitivity Analysis in 
Chair—Table Production 


The final (slack-variable) form of our linear program when an optimum 
was obtained by the simplex algorithm was 


Maximize — 35x, — 5x; + 67,000 
1, 2, 3, 4, 5, 6, and 


subject to x; = QO, 7 


XxX} + $x; —— 3X5 = 300 
21 2 
gt Ae or ee = 375 
3 3 4 5 (32) 
X>5 7 8X3 + 8X5 = 21D 
= its + xX, + x, = 1200 


Here x, and x; are the current independent variables whose origin 
X; = x; = OQ is the optimal corner. These are the slack variables for 
the original wood and braces constraints. Sox; = x, = 0 in the optimal 
solution means that the optimal production schedule will use all the 
wood and all the braces. To determine the values of the original inde- 
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pendent variables and other slack variables, we simply set x, = x; = 
O in (32) and read off the values of the remaining variables in each 
equation: 


x, = 300, x» =275, x= 575, x. = 1200 (33) 


As we had found earlier, we make 300 chairs (x,) and 275 tables 
(x,). The labor slack variable (x,) 1s 575 and the upholstery slack 
variable (x,) is 1200. These slack-variable values are the amounts of 
these two inputs that are unused in the optimal solution. We have an 
excess supply of labor and upholstery. Thus moderate decreases in the 
amount of labor or upholstery available will not affect our solution. 

What about changes in the input materials we do use, wood and 
braces? This is a place where the simplex algorithm really shines. First 
it is convenient to solve for x, and x, in the first and third equations, 
which involve x, and x,, respectively. 


X4 = 215 8X3 — gXs 


Having | less unit of wood (1399 units instead of 1400) is equiv- 
alent to increasing the wood slack variable x, from 0 to |. To determine 
the effects of 1 less unit of wood, we simply set x, = 1, while x5 = 
0, in (34). From (34) we have x, = 300 — $ and x, = 275 + 4. The 
new value of the objective function with | less unit of wood is also 
obtained by setting x, = | in the objective function (while x; = Q). 
We have —35 + 67,000. 

In summary, the coefficients of x; in (34) and in the objective 
function give the effect of | less unit of wood: chair production will 
decrease by $, table production x, will increase by %, and profit will 
decrease by $35. If we had 1 more unit of wood (x; = —1), the 
opposite occurs. Chair production increases by $, table production de- 
creases by g, and profit increases by $35. It is left as an exercise to 
the reader to evaluate the effects of changing the number of braces. 

In economics, the increase in the profit caused by using 1 more 
unit of an input is called the marginal value of the input. In this case 
it means that we should be willing to pay $35 for 1 additional unit of 
wood because that is the value of wood to us in increasing our sales. 

| 


Section 4.6 Exercises 


Summary of Exercises 

Exercises 1-12 involve converting a ‘“‘word problem’’ into a linear program 
(some of these problems previously appeared in Section 1.4); solutions, if 
requested, are to be obtained by graphing. Exercises 13-17 require one to 
solve linear programs. Exercises 18-21 ask for transportation problems to 
be converted into slack-variable form. Exercises 22-25 involve sensitivity 
analysis. Exercise 26 illustrates duality theory. 
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. Suppose that a building supervisor must hire 18-year-olds and 35-year- 


olds to do the following jobs: clean 500 windows, empty 800 waste- 
paper baskets, and mop 8000 square feet of floors. In 1 day, an 18- 
year-old can clean 50 windows, empty 100 baskets, and mop 500 square 
feet of floors. A 35-year-old can clean 100 windows, empty 100 bas- 
kets, and mop 700 square feet of floors. An 18-year-old gets $40 a day 
and a 35-year-old gets $55 a day. Formulate the problem of minimizing 
the cost of hiring workers to do the required work as a linear program. 
(Set up; do not solve.) 


. The Arizona tile company manufactures three types of tiles, plain, reg- 


ular, and fancy, in two different factories. Factory A produces 3000 
plain, 2000 regular, and 1000 fancy tiles a day and costs $2000 a day 
to operate. Factory B produces 2000 plain, 4000 regular, and 2000 fancy 
tiles a day and costs $3000 a day to operate. Write down, but do not 
solve, a linear program for determining the least-cost way to produce 
at least 20,000 plain, at least 30,000 regular, and at least 10,000 fancy 
tiles. 


. A farmer has 400 acres on which he can plant any combination of two 


crops, barley and rye. Barley requires 5 worker-days and $15 of capital 
for each acre planted, while rye requires 3 worker-days and $20 of 
capital for each acre planted. Suppose that barley yields $40 per acre 
and rye $30 per acre. The farmer has $4000 of capital and 500 worker- 
days of labor available for the year. He wants to determine the most 
profitable planting strategy. 

(a) Formulate this problem as a linear program. 

(b) Plot the feasible region and find the corner point that maximizes 

profit. 


. Suppose that a Bored Motor Company factory requires 7 units of metal, 


20 units of labor, 3 units of paint, and 8 units of plastic to build a car, 
while it requires 10 units of metal, 24 units of labor, 3 units of paint, 
and 4 units of plastic to build a truck. A car sells for $6000 and a truck 
for $8000. The following resources are available: 2000 units of metal, 

5000 units of labor, 1000 units of paint, and 1500 units of plastic. 

(a) State the problem of maximizing the value of the vehicles produced 
with these resources as a linear program. 

(b) Plot the feasible region of this linear program and solve by checking 
the objective function at the corners (by looking at the objective 
function, you should be able to tell which corners are good can- 
didates for the maximum). 


. Consider the two-refinery problem. 


Heating oil: 20x, + 6x, = 500 
Diesel oil: $x, + 15x, = 750 
Gasoline: 4x, + 6x, = 1000 
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Suppose that it costs $30 to refine a barrel in refinery | and $25 a barrel 
in refinery 2. What is the production schedule (i.e., values of x,, x5) 
that minimizes the cost while producing at least the amounts demanded 
of each product (i.e., at least 500 gallons of heating oil, etc.)? Solve 
by the method in Exercise 3. 


6. Plot the boundary of the feasible region for the linear program in Ex- 
ample 2 and solve this linear program as discussed in Exercise 3. 


7. Formulate the following transportation problem as a linear program (see 
Example 3); do not solve. There are three factories A, B, and C that 
ship motors to three stores 1, 2, and 3. Factory A makes 1000 motors, 
B makes 2000 motors, and C makes 3000 motors. Store 1 needs 1500 
motors, store 2 needs 2000 motors, and store 3 needs 2500 motors. The 
following matrix gives the costs of shipping a motor from a given 
factory to a given store. 


Factory 
A BC 
| je em Ao 
Ee ame I 
SE rte 3 


8. There are four boys and four girls and we wish to pair them off in a 
fashion that minimizes the sum of the personality conflicts in the 
matches. Entry (7, 7) in the following matrix gives a measure of conflict 
when girl 7 is matched with boy j. Set up this problem as a linear 
problem (see the hint at the end of Exercise 9). 


Boys 

we 2 See 

A oe al 

Bit y s. 0 ge 9 
Girl 

ek EAI 13.43 

Die iy a 


9. We wish to assign each person to a different job so as to minimize the 
total amount of time that must be spent to get all the jobs done. Entry 
(i, 7) in the following matrix tells how many hours it takes person / to 
do job j (a dash *“‘—’’ means that the person cannot do the job). Set 
up this problem as a linear program. 


Hint: This is a special form of transportation problem in which the 
‘‘demands’’ of jobs and “‘supplies’’ of people are all equal to 1. 
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Jobs 
ey AP be a re 
Aij9 4 — — 7 
Bi— 4 6 2 4 
People C|}/6 5 4 — — 
DWS: VS oS. 906 
E|— — 6 4 5 


We have a farm with 200 cows and capital of $5000 with the following 


idealized conditions. Cows produce milk for sale or milk to nurse two 


yearlings (which in a year become cows). A cow can generate $500 
worth of milk in a year if not nursing. It costs $300 to feed a cow for 
a year (no matter what its milk is used for). Write a linear program to 
maximize the total income over 3 years for the farm (be sure not to 
spend more money in a year than you currently have). 


An investor has money-making activities A and B available at the start 
of each of the next 5 years. Each dollar invested at the start of a year 
in A returns $1.40 two years later (in time for immediate reinvestment). 
Each dollar invested in B returns $1.70 three years later. There are in 
addition activities C and D that are only available once. Each dollar 
invested in C at the start of the second year returns $2.00 four years 
later, and each dollar invested in D at the start of the fifth year returns 
$1.30 in 1 year. The investor begins with $10,000 and she wants an 
investment plan that maximizes the gain at the end of 5 years. Give a 
linear program model for this problem. 


The Expando Manufacturing Co. wishes to enlarge its capacity over the 
next six periods to produce umbrellas so as to maximize available ca- 
pacity at the beginning of the seventh period. Each umbrella produced 
in a period requires d dollars input and one unit of plant capacity; an 
umbrella yields r dollars revenue at the start of the next period. In each 
period, Expando can expand capacity using two construction methods, 
A and B. A requires b dollars per unit and takes one period; B requires 
c dollars per unit and takes two periods. Expando has D dollars initially 
to finance production and expansion (in no period can more money be 
spent than is available). The capacity initially is K. Formulate a linear 
program to maximize production capacity in period 7. 


Work through the simplex algorithm for the linear program in Example 
5 to verify systems (26) and (27). 


Solve the following linear programs using the simplex algorithm. 


Sec. 4.6 Linear Programming 
(a) Maximize 3x, + 2x, (b) Maximize 4x, + 6x, 
subject to subject to 
x, =0, x =0 Fi Oy, Xa ee O 
Dk, oF 425.512 3x, + 4x, = 12 
4x, + 3x, = 12 % + 2k, => F 
x, + 2%.= 8 at, t+ %&» = 6 
(c) Maximize x, + x, (d) Maximize 3x, + 2x, 
subject to subject to 
Liz x= yee, ey =U 
Se, A es S19 oh) = X= O 
2x, + 3x, = 15 Bothim=4 
2K5 akg Se T2 


SX, + x, =3 


15. Solve the following linear programs using the simplex algorithm. 
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(a) Maximize 2x, + 3x, + 4x, (b) Maximize 3x, + x, + 2x; 


subject to subject to 
coo GO, tS SV SSS 4 =e UY OR 
xr 2 FX STO 2k, 4 Shy FX 
SE Kon kha: ae he Mer xX FF 2x5 
ty cb Ket zi, 6 Meo 2X5 bh (X55 
(c) Maximize 3x, + 4x, + 2x, (d) Maximize 2x, + x, + 

subject to subject to 
Fey Hye = Gea, 6:20, xX, 
3X, + 2X, + 4% = 15 BX; + 3X, + 3x; = 
Byikt obte eS ae Kut tke (Xa 
2X © ke + oes, 6 Mock «Fara LR = 


16. Solve the following linear program using the simplex algorithm. 


Maximize 2x, + 4x, + x, + x, 
subject to | 
Me en Ha ee Bee Me 24K) 
Aer: ite be2kp iar 12 
ky + X_. FZ, =’2Z0 
255% Xm. Hr Gks = 16 


17. Solve the following linear program using the simplex algorithm. 


Maximize 15x, + 28x, + 19x, + 24x, + 34x, 
subject to 


= 0 
15 
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18. 
19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 
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B= 0, 20. SSP aS, & = 0 
KX, Hilde t ky ROO ee = 90 

2h, & Xe Xe PF Ait xX 

2b ie Ae Bae or ekeee BO 

Sy E28 ey PSs S190 


\ 
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= 
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Put the transportation problem in Exercise 7 in slack variable form as 
shown in Example 6. Also express the problem in inequality constraints. 


Put the dating problem in Exercise 8 in slack-variable form as shown 
in Example 6. Also express the problem in inequality constraints. 


(a) Show that the last equation in the transportation problem in Ex- 
ample 6 is redundant by verifying that the sum of the warehouse 
equations equals the sum of the college equations; hence the sum 
of the warehouse equations minus the first college equation will 
equal the second college equation. 

(b) Explain in words why any nonnegative solution to the first four 
equations in (28) would also have to satisfy the fifth equation. 


(a) Use the equations in (30) to rewrite the objective function for the 
transportation problem in Example 6 in terms of just x,, and x55. 

(b) Now solve the transportation problem graphically using the linear 
program in (31) with the objective function from part (a). 


Repeat the sensitivity analysis in Example 7 of the chair—table produc- 
tion problem for braces: How would profit change with | less unit of 
braces, how would the number of chairs and number of tables change? 


Perform sensitivity analysis on the farming linear program in Example 
4 of Section 1.4. What are the effects on profit and on amounts of corn 
and wheat planted if | less acre is planted? If 1 less dollar of capital is 
available? 


Solve the farming problem in Exercise 3 using the simplex algorithm 
and perform a sensitivity analysis on all constraints that are fully used. 
For example, determine the affect of having | less dollar of capital. 


Solve the car—truck production problem in Exercise 4 using the simplex 
algorithm and perform a sensitivity analysis on all constraints that are 
fully used. 


Consider the following two linear programs. 


(1) Maximize 3x, + 3x, (ii) Minimize 10x, + 8x, 
subject to subject to 
xy OU x= x = 0, xt = 0 
Sk, 4 2 = 10 3x, + 4%, = 3 


4x, + X= 8 245 Xess 
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Solve them by graphing and show that the optimum values of these two 
objective functions are the same. What relations link the input data in 
the two problems? 


_ Appendix to Section 4.6: Linear Programming Details 


Simplex Algorithm for a General 
Linear Program 


We use the following notation involving n variables and m inequality con- 
straints. 


General Linear Program—Inequality Form 


Maximize ¢)%). + 5x, = + Ox, Pad 
suoect tox, = 0x5 =U, .. x, 28 
A,X, + 5X5 + ee + aAip,Xp, + al ae + a) ,X, = b, 
A> \;X; + A59X> + » % + A>5),X}, + Ne + Ay,Xp, = b, 
(la) 
Oke WP hg hy a aie Sena SB, 
Cot Pe iGiaks Fe? + Bak Te? FH ed = De 


or in matrix notation, 


Maximize ec: x + d 
subject to (1b) 
Ax = b, x=0 


Upon restating the inequality constraints in (1a) as equations with slack 
variables, the system is 


General Linear Program—Equation Form 


Maximize .c.X%; + Cots HS OF bos or Oe Fa 

puect in a, = Oxo EN ES aS Oi acs Hy, SO) aad 
Ay yXy F AypXp Fe TF AypXp, Tes TF AipyXy TF Xn4y = b, 
AgX + AyXq Fes F Any Xp, T° * FT AanX, F Xn+2 = by 
A,X i 2X apr 8 ee ee don Xp soe ide a Gon Xp, + Xn+g = b, 
Ay \X| + Amro ds AmnXp : a ce AmnXn a Xn +m Di» 


360 


Ch. 4 A Sampling of Linear Models 


The first n variables are our independent variables from (la) and the last m 
variables are the slack variables. Keep in mind that after iterations of the 
simplex algorithm, the positions of variables that play the role of slack 
variables become totally scrambled. (In linear programming texts, the set 
of slack variables is called the basis of a linear program in equation form.) 

If we let x* = [x, x’], where x’ = [x,.,, - . - , X,+,] 1S the vector 
of slack variables, and A* = [A _ I], then (2a) can be written 


Maximize c+ x + d 
subject to (2b) 
Atx* = bp. x* => 0 


It is important to note that the simplex algorithm assumes that the 
origin of the current independent variables is a feasible corner. This is the 
solution obtained by setting all independent variables equal to 0 and setting 
the slack variable for row i equal to b;. If the origin is not feasible, see the 
section *‘Finding an Initial Feasible Solution.’’ 

Now we state the general simplex algorithm. In part I we find the 
independent variable x,, with the largest positive coefficient c,, in the objective 
function. This is the variable whose increase provides the greatest rate of 
increase in the objective function. If no coefficient c; is positive, the current 
corner is optimal. 

Next we determine how much we can increase x, (while all other 
independent variables remain = Q). The inequalities in (la) reduce to 


bits Sy t= 


BX Pah SS 


(3) 


Gute Dy dtp Ss 


AmnXh = b,, —> Xp = 


Then we can increase x, by the minimum of these bounds. Let ¢, be this 
amount. If some a;, = 0, then x, can increase indefinitely without violating 
the 7th constraint. So we only want to examine constraints with a,, > 0. If 
there is no positive a;,, then x can increase indefinitely without violating 
any constraint—this “‘pathological’’ situation is mentioned later. Summa- 
rizing, we have 
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Part 1 of Simplex Algorithm 


1. Let x, be the independent variable with the largest positive coefficient c,, 
in the objective function. If all c; = 0, the current corner is optimal. 
2. Determine the row index / that achieves the minimum value for the ratio 
b./a;, when a,, > 0. Let g be the minimizing i. 
3. (a) If all a,;, = 0, x, can be increased infinitely and the problem is 
unbounded. See Abnormal Possibilities II below. 
(b) If the minimizing ratio b,/a,, equals 0, that is, b, = 0 (and every 
other row i has b; = O or a,, = 0), special steps must be taken. See 
Abnormal Possibilities III below. 


The intersection of the gth constraint with the constraints x; = 0, 
j # h, forms the new corner (0, 0,...,¢,,..., 0). 

Part 2 of the simplex algorithm requires us to rewrite the equations in 
(2) to make this new corner become the origin of the new independent 
variables. We do this by performing a pivot exchange between independent 
variable x, and the slack variable x, in equation g; that is, we pivot on 
entry d,,. 


Part 2 of Simplex Algorithm: Pivoting on a,,, 


1. The old gth constraint equation is rewritten to make x, become a slack 
variable, that is, make x, have coefficient 1. This is accomplished by 
dividing the equation by a*. 

2. Set the column of coefficients for x, equal to e, (all 0’s except in the gth 
equation). This is accomplished by the standard elimination by pivoting 
process. 

3. The objective function undergoes the same change as in step 2 (x, drops 
out and x, comes in). 


The following diagram illustrates steps 1, 2, and 3. Suppose that 
a* = a,, is again the pivot entry, p = a,, is another coefficient in the pivot 
row, gq = a,, is another coefficient in the pivot column, and r = a;; is a 
oefficient in p’s row and q’s column (possibly g and r are in the objective 
function, or possibly p and r are b,’s). Then part 2 of the simplex algorithm 
has the form 


Columns 
Before h J g 
pivoting 
ee ee Qe ssw w ee prtttee | (4) 
Rows . . 
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Columns 
After h j g" 
pivoting 
p | 
eo eee VAs = oar Fe A = (5) 
Rows 
Pq —q 
Te eae = vis. Gib =) 


If r is the constant —d in the objective function, pivoting changes —d 
to —d — pgq/a* (the objective function constant increases from d to d + 
pq/a*). In the display above we have listed column g’, the column of the 
new independent variable x,, which previously was the slack variable in 
row g. 

After pivoting, we return to part | of the simplex algorithm for the 
linear program in this new form and continue doing part | and part 2 until 
no improvement in the objective function is possible in part 1. Then we 
shall have found the maximum. 


Abnormal Possibilities 


No Feasible Points. The feasible region for a linear program may be empty. 
That is, no point lies on the correct side of all the inequalities, for example, 
to satisfy x, + x, = —1, either x, or x, must be negative—but x, = 0 is 
required in all linear programs. A linear program Is called infeasible in this 
case. 


Unbounded Feasible Region. When we increase x,, in part 1, perhaps x, 
can be increased without limit (to infinity). See step 3(a) of part | in the 
simplex algorithm. This means that the feasible region is unbounded along 
the x,-axis and the maximum value of the objective function is infinite. If a 
practical problem has been misformulated (or data not entered correctly), 
this difficulty can arise. When this happens, the linear program is called 
unbounded. 


Degeneracy. In very rare cases, it can happen that in part | replacing an 
independent variable by a slack variable does not increase the objective 
function because some constraint has a zero right-hand side (see step 3(b) 
of part | in the simplex algorithm). This can happen without being at an 
optimal point. This phenomenon is called degeneracy. Since it is rare and 
requires special methods to handle, we shall ignore degeneracy in this book. 


Sec. 4.6 Appendix: Linear Programming Details 363 


Finding an Initial Feasible Solution 


How do we start the simplex algorithm if the origin [setting all the inde- 
pendent variables in (1a)] is not a corner of the feasible region? For example, 
consider a two-variable problem with constraints 


OX ee 4 (6) 
Xx) _ 3X5 = —6 


or, multiplying the first inequality by — 1, 


25 = tee =F (7) 
x= 3k%5 = =—6 
Neither of these inequalities is satisfied by x, = x, = O. Let us put 


the inequalities of (7) in equation form with slack variables x3, x4. 


= 2), “Os OF aa = —4 (8) 
XxX; en 3X5 “+ X4 — —6 
As in (7), setting x, = x, = 0 cannot yield a solution of (8), since we 


require x, = 0, x, = 0. 

There is a standard “‘trick’’ for finding a starting feasible corner (and 
associated set of independent variables) when the origin, x, = x, = Q, is 
infeasible. We add a new equality-violation variable on the left-hand side 
of each constraint with a negative right-hand side. For the system (8), we 
introduce x5, X¢. 


5 nS r— 2) os 2 Pall a — 4 
x, X> + X Xs = (9) 
XxX a 3X, fa X4 — X6 _— —6 
where x, = 0, x, = O. Let us multiply (9) by — 1 to get 
2x, + X5 < X3 + Xs = 4 (10) 
i te aes Xn + xX, = 6 


Now x, = xX» = Xx; = x, = O yields a feasible solution to (10), with 
xs = 4 and x, = 6. We can apply the simplex algorithm to (10). 

Next we define a contrived objective function that makes us look for 
a corner where the equality-violation variables become zero, that is, a corner 
satisfying our original equations (8). The linear program we want to solve, 
with the simplex algorithm, is 
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Maximize — x5 — X¢ (11) 
subject to (10) and x; = 0 


Since x5, X, are nonnegative, the best possible maximum for (11) would 
be 0, occurring when x; = x, = O. The values of x,, x5, x3, and x, when 
Xs; = X, = O will be the starting feasible solution we need to use the simplex 
algorithm on the original problem. If we solve the linear problem (11) and 
do not get a maximum of OQ, then the original system of constraints was 
infeasible. 

Note that since x; and x, are slack variables, we must rewrite the 
objective function in (11) in terms of the independent variables x,, x5, x3, 
x,. Reading off expressions for x; and x, from (10), we have the linear 
program 


Maximize (2%, X>5 — Xs = 4) ex, See = Xe — 6) 
—_ Xj a 4x, ~— X3 = X4 aes 10 (12) 
subject to (10) and x; = 0 


Matrix Representation of Pivoting and 
Revised Simplex Method 


The pivoting operation of the simplex algorithm can be expressed as a matrix 
product. Let the linear program be written in equation form with slack vari- 
ables as in (2): 


Maximize c: x + d 
subject to A*x* = b, x* = 0 


We can obtain the new coefficient matrix after pivoting on entry a,,, 
as the matrix product PA*, where P is the m-by-m “‘pivot’’ matrix. Since 
A* = [A_ I], where I is the m-by-m identity matrix, we can write 


PA* = P[A Ij] = [PA P] (13) 


From (13) we see that the matrix P, assuming that P exists, is just the 
m-by-m submatrix formed by the last m columns (the columns of the slack 
variables) after pivoting 1s performed. As noted earlier, the columns in the 
slack-variable submatrix are unchanged by pivoting except for column 
n + g [which is as shown in (5)]. Thus if a* = a,, is the pivot entry, then 
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Column 


risa ft emi neice oes | | (14) 


P is the identity matrix with column g replaced by the Ath column a,, of A* 
divided by a*, except that entry (g, g) of P is 1/a*, the inverse of the pivot 
value a* = Ggy. 

As a check, let us compute entry (i, /), 7 # g, in the product PA. This 
entry will be the scalar product p; - a; (where p; denotes the ith row of P 
and a; the jth column of A*). Row vector p; has just two nonnegative entries, 
Pi = 1 andp, = —a;,,/a*. Thus 


entry (i, j) of PA = p;° a; = pyay + DjGo; 


qi; 
la; — (21) ay; (15) 


This expression corresponds to the value of entry (i, j) in table (5), since 
a, = r, ag = p and a, = q. So P is, as advertised, the desired pivot 
matrix. 

As mentioned above for table (5), the column vector b and objective 
row vector c are changed in pivoting the same way A* is. So we can expand 
A* into an (m + n + 1)-by-(m + 1) matrix A~ of all the data. 


wearye & 0 —-—d 
pent ® I 4 He) 
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(The minus sign for d is a technicality based on the fact that the constant in 
the objective function is not on the right-hand side of an equation, the way 
the b; are.) The (m + 1)-by-m pivot matrix P~ for A~ has a corresponding 
additional top row with —ch/a* in its gth entry and 0’s elsewhere. 

Let us use the notation P,,, to denote the pivot matrix when we pivot 
on entry (g, A) (to exchange independent variable x, and the slack variable 
for row g). Then a sequence of k pivots on entries (g,, 1), (22, Ao), . .. . 
(2,, A,) would produce the new linear program with matrix A,’: 


A+ =P.,P 


gihy” gohp PonA” (17) 


Sxhy 


The revised simplex algorithm uses (17) to compute just the cost coef- 
ficients c; in A,’ , needed to select the pivot column. We determine which 
column has the largest c;; call its index h’. Next (17) is used to compute b 
and the A’th column of the constraint matrix in A, , needed to select the 
pivot row. The other entries in A are never computed. The pivot row is the 
row that achieves the minimum positive value of b,/a,,,; call it row g’. The 
next pivot matrix P,,.,,, can be constructed using the entries of the /'th column 
[see (14)]. This completes one iteration of the revised simplex algorithm. 

The pivot matrices are easy to store and multiply. Only column g’ 
needs to be stored, and even this vector is likely to be very sparse, since 
the coefficient matrices in large linear programs are very sparse. Because 
we only calculate one row and two columns of the current linear program, 
a tremendous savings in time is achieved over the standard simplex algo- 
rithm. 


Section 4.7 Linear Models for 


Differentiation and Integration 


Although calculus deals with highly nonlinear functions, both the theory and 
computations associated with calculus are built on linear models. It is the 
computation that will be our primary interest in this section. We will show 
how linear approximations to arbitrary functions allow one to solve numer- 
ically almost any calculus or differential equation problem. 

We first point out the central role of linearity in calculus. The derivative 
of a function is the essence of a linear model. The derivative f’(x) of a 
function y = f(x) at a point (Xp, yo) is the slope of f(x) at (x, yo). That is, 
f'(%q) gives the slope of a line through (xp, y9) that coincides with f(x) when 
x 1s very close to x, (see Figure 4.14). In other words, the derivative gives 
the slope of a linear approximation to f(x) at a point. 

An equally important aspect of linearity in calculus is the fact that 
differentiation is a linear operation, in the sense that if f(x) = ag(x) + 
bh(x), where a, b are constants, then f'(x) = ag'(x) + bh'(x) [the same 
way that A(au + bv) = aAu + bAv|. For example, we compute the 
derivative of f(x) = 5x° + 6x? by knowing that 3x? is the derivative of x° 
and 2x is the derivative of x, and then f’(x) = 5(3x*) + 6(2x). Integration 
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Figure 4.14 Slope of line tangent to y = f(x) is 
the derivative. 


is also a linear operation. Without this linearity, computing the derivative 
or integral of a polynomial would be a very complicated process. 

We now show how to use the linear approximation of a function given 
by the derivative to build an iterative scheme to find zeros of a function. 
The scheme is called Newton’s method. For x-values close to x = xy), we 
have 


f(x) = f'(%)x + € for an appropriate constant c (1) 


Pretend that (1) is a good approximation for f(x) over a wide interval around 
X = Xo. We shall use (1) to approximate where f(x) has a zero [where 
f(x) = O]. If f(x) has value f(x,) at x) and has slope f’(x,), then the linear 
approximation (1) decreases by f'(x,) for each unit we decrease x and will 
be zero if we decrease x by f(x,)/f'(x)). Thus we estimate the x-value x, 
that makes f(x) = 0 to be 


f (Xo) 
x, = Xo f'G%) (2) 
(see Figure 4.15a). Note that if f(x))/f'(%)) < 0, then x, > Xp. 

Normally, f(x,) # 0, because the derivative f’(x)) only approximates 
the slope of f(x) when x is very near x). However, there is a fair chance 
that x, is close to a zero of f(x). Let us use the approximation (1) again, 
now at the point x,, to estimate where f(x) is zero. Similar to (2), we get 


te muasTlagh 
2 I f'(x,) 
We continue in this method to approximate the true zero of f(x). The general 
formula is 
fn) 
Xn + — Xn — ' (3) 
| f'n) 


We stop when f(x,,) is very close to 0. Once we get an x-value close 
to a zero of f(x), the approximation (1) is quite accurate and our method 
converges quickly to a zero of f(x). 
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X3 X94 xy Xx 


(b) 


Figure 4.15 (a) Newton’s method. (b) Function y = «© — 5x* — 4x° — 3x - 
2x = i. 


We should mention that if one cannot compute f'(x) by a differen- 
tiation formula, then f'(x) must be estimated by the approximation 
[f(x + h) — f(x)]/h (for some small h). 


a 
Example 1. Newton’s Method 


Consider the function f(x) = x* — 5x + 4. We use Newton’s method 
to find a zero of this function. First we note that this type of problem 
arises in determining eigenvalues. Suppose that we want to find the 


3 
| Then we must find the zeros 


eigenvalues of a matrix A = . 5 


of det(A — Al), 
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Bite up a 
2h 
= (3 —aA(2-A)-2-1=N-SA+4 


det(A — AI) 


For this f(x), f'(x) = 2x — 5. Suppose that we start with 
Xy = 10. We calculate that f(10) = 54 and f’(10) = 15. So Newton’s 
method estimates the zero to be at 


fe) 4 


koa = — = 6.4 
we 15 


ay 


Next we calculate f(6.4) = 12.96 and f'(6.4) = 7.8. Then 


f(x,) 12.96 
— — = 64 —- — = 4.74 
f'(%)) 7.8 


XN, =X 


Calculating f(4.74) = 2.77 and f'(4.74) = 4.48 gives us 


Tie) 2.77 


xX, =X- f(y) = 4.74 — 1.48 = 4.12 
and 

ap sy LED bana i 8 

Xs =X - Gm = 4.006 — ae = 4.000024 
Clearly, we are converging to 4. That is, f(4) = 0. a 
ESS 


Example 2. Bad Performance by 
Newton’s Method 


Use Newton’s method to find a zero of the function y = x° — 5x* — 
4x? — 3x* — 2x — 1. We have plotted this function in Figure 4.15b 
with the y-axis magnified near the origin. This curve gets close to a 
zero at x = —.4 but only has a relative maximum with y >= —.56. It 
decreases awhile and then increases sharply with f(5.5) = —310, f(6) 
= 300, and f(7) = 3000. If we start Newton’s method with x, = 0, 
we get the sequence of points shown in Table 4.6. 

We see that our sequence of points gets caught around the relative 
minimum at x ~ — .4 and tends to swing back and forth from one side 
of — .4 to the other. It finally gets very near the minimum on the 107th 
iteration. Here the slope f'(x,97) is almost zero, so dividing by f'(x, 97) 
(= .04) in (3) brings a large change in x that gets us away from this 
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Table 4.6 

a ee ee 

i Xj F(x;) f'(x;) 
0 0 = = 
| —— — .594 812 
2 231 — 1.683 wt, 
3 =, 162 Sty i —1.24 
4 =. 138 = 2:50 6.02 
5 = 353i — .644 1.19 
6 — .008 — .980 = ]'.95 
7 eat) o* — .604 960 
8 117 — 1.28 —2.90 
9 — .324 — .589 = aes 
10 —1.134 — 14.7 49 
1] — 1.046 —4.85 20 
12 — .804 — 1.67 7.54 
13 — .581 — ./03 1.93 
14 = 218 = .679 — 1.04 
15 — 868 =22 10.1 

107 — 420 — .56 04 

108 13.14 233,700 101,763 

109 10.84 75,585 42,266 

110 9.06 24,144 17,789 

11] 7.70 7.505 7,710 

112 6.73 2,183 3,578 

113 6.12 530 1 943 

114 5.848 76.5 1,400 

115 5.793 2.65 1,304 

116 5.791324 .0036 1,300 

117 5.791321 ~0 1.300 

relative minimum—x,9, = 13.14. Now Newton’s method homes in 


on the minimum. 

If we had started with x, > 5, the procedure would have con- 
verged quickly. Although it did finally escape from the region of the 
relative minimum, a better scheme would have been to pick a new 
starting value when the method had not converged after 20 iterations. 

a 
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Next let us consider the definition of the integral of a function f(x). 
The integral of f(x) is the area under a curve from 0 to x (sometimes 0 is 
replaced by another x-value x)). The formal definition of an integral is the 
limit of the sum of areas of approximating rectangles 


[ £°0) =ax tim DG = 4 vf (4) 
0 n—<x j=] 

where t; = i(x/n). The t; subdivide the interval [0, x] into equal subintervals 
of length x/n, with fj = 0 and t, = x (see Figure 4.16a). We are approx- 
imating the area under the curve f(x) with the area under the piecewise 
approximation to f(x): 


fAx) = f(t) when t;_, <x <t, ‘i gy ie eee GT 


That is, the sum on the right-hand side of (4) is the area under f,(x). The 
function f,,(x) is a piecewise constant function, since it is made up of constant 
functions with a different constant on each subinterval of [0, x]. 


Figure 4.16 (a) Set of rectangles 
whose areas approximate the area un- 
der the curve. (b) Piecewise linear ap- 
proximation to curve f(x) in part (a). 
Set of trapezoids approximate the area 
under f(x). 


O tf, t, t3 ft ee 


(a) 


0 fy ly [3 I, i =F 
(b) 
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The points f; in (4) are called mesh points. The collection of mesh 
points and the subintervals they form is called the mesh. As the mesh be- 
comes finer (as n increases), it is intuitively clear that the area under f,,(x) 
will approach the area under the curve f(x). 

Let us consider how we might try to calculate the area under a function 
f(x) from 0 to x° when we do not know how to integrate exactly [i.e., no 
integration formulas apply to f(x), and integration by parts, etc., all fail]. 
One obvious approach is to use the definition of the integral given in (4). 
That is, we pick some number of mesh points, say n = 5Q, for the interval 
(0, x°). Then we calculate the value of f(x) at the mesh points and compute 
the sum on the right-hand side of (4). A better estimate for the area under 
f(x) should be obtained by using linear approximations to f(x) in each sub- 
interval that agree with the values of f(x) at the subinterval endpoints (see 
Figure 4.16b). It is left to the reader to check that the following piecewise 
linear approximation does this. 


Be) 
8,(x) = ew AE. (x = G24) + FG-) 
Soe (6) 
Mi.) S26, =e 17, Sek 


By looking at Figure 4.16b, one sees that the region in each subinterval 
using (6) is a trapezoid. The area of this trapezoid is the same as the area 
of a rectangle with height halfway between f(t;_ ,) and f(t;). That is, using 
(6) gives the same area as using the piecewise constant function 


\ + Khe; 

2*(x) = eG fe a ae ae Bee Se (7) 
Let h = t; — t;_, = x°/n. Then the integral of f(x) from 0 to x° is 

approximated by the area under g*(x), 

+ 
| exwar= 1 > f(t;) - 1) 
(3) 
hf (to) hf(t,,) 
= Ah) +h > fa) + ae 


Using (8) to approximate an integral is an integration scheme called the 
trapezoidal rule (named after the trapezoids in Figure 4.16b). Note that this 
rule weights the two endpoint values f(t,) and f(t,) half as much as the 
other f(t;). Various schemes have been developed that give different sets of 
weights to the f(t;). 


PS | ae 
Example 3. Piecewise Approximation of 
Area Under Curve 
Consider the function f(x) = 1/(x* + 1). We want to compute the 


area under f(x) on the interval [0, 2] using the piecewise approxima- 
tions given in (5) and (6). To make calculations easy, let the mesh 
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points be fj = 0, ft, = .5, = 1,% = 1.5, t, = 2. Then we get the 
table of function values 


The approximation of f(x) by the function in (5), 


fa(x) = fi(t;) fry, =x=t t= 1,2,3;4 


l 
gives 


8 on (0, .5] 
mn ee ee 
J = 1 3 on tt. 1.5) ) 


#0n [1 3,1] 
and the approximation of f(x) by the function in (6), 


g,(x) = we  cimeahls a ie atl 5: SRP. 
pe 7 


Ort 4 Ss kt hg ES he ey 


gives 


4x + 1 on [0, .5] 
=— 6x + .8 on [:5; 1] 

. 10 

84%) — 4x + .5 on [1, 1.5] is 


Mak end fl Ds.2] 


The area under f(x) equals .5(.8 + .5 + .3 + .2) = .9 and the area 
under g,(x) equals [using (8)] .5{1/2 + (.8 + .5 + .3) + .2/2} = 
1.2. The actual area under 1/(x? + 1) from 0 to 2 is about 1.11. So, 
as expected, g,(x) led to a better estimate of the area. ke 


In (5) we approximated f(x) with a piecewise constant function f,,(x), 
and in (6) we approximated f(x) with a piecewise linear function g*(n). A 
better approximation to the area under f(x) can be sought by using piecewise 
approximations to f(x) that are quadratic or cubic functions. It turns out that 
cubic functions have several good properties that quadratics lack, so piece- 
wise cubic functions are frequently used to approximate functions in inte- 
gration and other calculations. The cubic function in the ith mesh interval 
would be 
Shxy Sa 4+ bx + eX ad fort, =x =t (11) 


Mi 
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We want each cubic piece to equal f(x) at the mesh points ¢;_ , and f,, 
that is, 


S(t;_,) = f(t;_)) and IEP st Ttroe 2) ML Aes, abs ae \CEZ) 


Another desirable property for an approximating function is to be 
smooth at the mesh points t;, where the pieces meet. By this we mean that 
the first and second derivatives of two successive cubics coincide at their 
common mesh point. Then we require 


Si(t;) = 55 44(t)) (13) 
Si(t;) = S;4,(t;) 1 Ge eM (ie Sa ee 


The term spline is the name given to a smooth piecewise function. Cubic 
splines are used to approximate curves in thousands of applications. For 
example, automotive designers use splines to approximate car contours in 
computer models that simulate a car’s wind resistance. Splines are widely 
used in computer graphics to generate the forms of complex figures; they 
are used to join points made with a light pen in tracing out a figure on a 
terminal screen. Once a figure is represented by mathematical equations, it 
is an easy matter to rotate, shrink, and perform other transformations of the 
figure, as discussed in Section 4.1. 

The problem with splines is determining the coefficients in each of the 
cubic pieces. However, the system of equations (12) and (13) determining 
the coefficients in the splines can be collapsed to a tridiagonal system of 
n — 1 equations inn — | unknowns. See the Appendix to this section for 
details. Such tridiagonal systems can be solved very quickly (see Section 
3.5), so spline approximations can be computed very quickly. 

Another important approach, called functional approximation, for ap- 
proximating a function for integration and other purposes is to use one linear 
combination of nice functions (e.g., whose integrals are easily computed) 
to approximate f(x) over the entire interval from 0 to x°. This approach is 
discussed in Section 5.4. 

We next consider linear approximations to solutions of differential 
equations. The derivative f’(x), the slope of function f(x) at x, is approxi- 
mated by 


_ fx + bt) — f@) 


"x 14 
f' (x) h (14) 
Indeed, the limit of (14) as h goes to zero is by definition f'(x). First and 
second derivatives of complicated functions are often needed in differential 


equations computations. 
a ee ey 
Example 4. Discrete Approximation to a 
Differential Equation 


Consider y(x), the temperature of a rod, as a function of x, the distance 
from the left end of the rod. If a heat source is applied to the rod with 
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a temperature f(x) at position x, the following differential equation 
describes how f(x) affects y(x). 


24) 
‘pe — f(x) Ox<xxx<d (d = length of rod) (15) 


For example, f(x) might be zero everywhere except at two short seg- 
ments of the rod. It is too difficult to solve this differential equation 
analytically for most interesting f(x). Instead, one seeks an approxi- 
mate solution by computing values of y(x) at a set of mesh points Xp, 
X,,... ,X, on the interval (0, d). Ifh = x; — x;_,, then atx = x,, 
(14) becomes 


WX. 1) — yx) 


"(x.) = 16 
y (;) h (16) 

We could just as well use 
' WA) (x; Y 
y'(x,) = (x; ) a 1) (16') 


Next, using the fact that y’(x) is the derivative of the derivative, we 
can estimate y’(x;) using (16) or (16’) twice. To make the result sym- 
metric about x,, we use (16) for y'(x;) and (16’) to get y"(x,). 


iy’) — y'@-D} 
h 
h 
ef 1Vi+1) — 2y%) + yOi-v} 
~ 72 


y (x;) ~ 


(17) 


Substituting (17) into (15), we obtain a system of equations for the 
values y(x;). Letting y; = y(x;) and f; = f(x;), we have 


Vitis reso Tipe 1 
h fj 
Or 
vex = ley > oy; 2) — —h*f,, i= l. ‘ns | (18) 


We have n — | equations for the n + | unknowns yo, y,, . . - 
y,, Like the differential equations in Section 3.3, we need to specify 
two starting, or boundary, conditions. Typically, these have the form 
Yo = a and y, = b. Suppose, for later reference, that we choose the 
boundary conditions 
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76” Sa v (19) 


The physical interpretation of (19) is that both ends of the rod are 
attached to objects that dissipate away all the heat at the ends. Now 
we drop y, and y,, from (18), since by (19) they are 0. This reduces 
(18) to a system of n — | equations inn — | unknowns. 

In matrix terms, (18) becomes 


Dy = f* (20) 
where y = ();, Yo, . - - » ¥,~-,)—temember that y, and y,, are dropped, 
since they are O—f* = (h*f,, h?f,,..., hf, —,), and D is 
D = 

ss" l 0 0 0 
1. 2 l 0 0 
0 | =2Z l 0 
0 0 L, =e | 
LZ l 
Bag Seay l 

lieetZ l 0 0 

0 iy =2 l 0 

0 0 1) 2 l 

0 ) 0 a: 


(21) 


Observe that D is a tridiagonal matrix, so Dy = f can be solved 
very quickly by Gaussian elimination (see Section 3.5). Let us use 
(20) to get an approximate solution to our differential equation (15) on 
the interval [O, 1] with 100 mesh points. So n = 100 and h = 
1/n = .0O1. Suppose that f(x) represents a heat source applied to a 
point at the middle of the rod: 


fsq = 10,000 and all other f; = 0 (22) 


Then, if f*,—A*f59 =—(.01)? x 10,000 = —1. So f* is all 0’s except 
— | in its fiftieth entry. 

Applying Gaussian elimination to (21) plus the right-side vector 
f*, we obtain the following matrix. 
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=2 | 9) 0 Be 
0 -3 I 0 0 
0! PHS ing 
0 ) 0 | 0 
i oe eee 
—22 | _ 50 
o1 ol 
=<H a 0 |-8 
0 —$8 | — 38 
0 0 -W | -88 
Back substitution yields 
k 
_= i a Oe a RE 
Yk D (24) 
Ls a diet. 


V; ke. SD. SL. xv ae ‘OS 


2 


The continuous function y(x) that (24) approximates is clearly 


Sdx, O<x<.5 
Z 25 
We) i, ~ 50x, .5<x< 1 =) 


The solution (25) says that if the middle of the rod is heated and 
the ends are kept at temperature 0, the temperature will decrease at a 
uniform rate along the rod toward the ends (as opposed to an expo- 
nential decay or some other nonlinear decrease). This uniform decrease 
is indeed what occurs in nature. & 


By letting the number of mesh points grow very large, our solution 
vector y in (24) becomes a better and better approximation to the true solution 
y(x). In the limit, y becomes an infinite vector that ‘‘is’’ y(x). Thus any 
continuous function can be thought of as an infinite-length vector. In Section 
5.4 we show another way to view continuous functions as vectors. 

Most methods of solving differential equations by discrete approxi- 
mation are called finite difference schemes. The other basic approach to 
approximating solutions to differential equations is finite element methods. 
Here the function f(x) is approximated as a sum of special functions for 
which the differential equation is easily solved. 
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Section 4.7 Exercises 


Summary of Exercises 

Exercises 1—3 involve finding zeros with Newton’s method. Exercises 4—9 
involve approximating areas under curves. Exercises 10 and | 1 involve finite 
difference methods for solving differential equations. Problems about splines 
appear in the Exercises in the appendix to this section. 


1. Use Newton’s method to find a zero of the following functions; let 
Xy = 2. 


(a) f(x) = 3x — 5 

(b) f(x) = x* + 6x + 5 

(c) f(x) = x — 3x* + 3x - 1 

(d) f(x) = x + x* -— 2x 

(e) f(x) = cos(x) (x in radians) 

(f) f(x) = x* — 2x + 8 

(g) fx) = x* — 4° + 5x* — x - 15 
(h) f(x) = vr + 4e —- 8x - 3 


2. Use Newton’s method to find all zeros of the functions in Exercise | 
in the interval from —2 to 6. 
Hint: If a is a zero of f(x), the other zeros of f(x) are also zeros of 


f(x)/@ — a). 


3. Use Newton's method to find all eigenvalues of the following matrices. 
(See the hint in Exercise 2 and see Section 3.1 for determinant for- 


mulas.) 
4 0 eee oD OQ .4 4 
a) 1, 4 mie" o 2 4 6 
Oo  & 24 Pus - So 
ao i Bese 
a|> 9 9 9 
ee 
0. 0< gry 


4. (a) Use the trapezoidal rule, equation (8), to estimate the area under 
the curve f(x) = x? — 2x + 1 from 0 to 4 with the mesh 
{0, 1, 2, 3, 4}. Determine the area exactly by integration. Also plot 
this function and the approximation given by the piecewise linear 
function in (6). 

(b) See how accurate your answer gets with a denser mesh. Compute 
with the trapezoidal rule again with meshes. 
CYR LS PRD et Ue Py ae as gee 
‘EM me ae So SAN MR A sagt a eS i Le Sl A, a a 
3295 Dudy. Did ao AEs 
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5.. (a) 


(b) 


(b) 


(b) 


(b) 


Use the trapezoidal rule, equation (8), to estimate the area under 
the curve f(x) = x — 2x + 4 from 0 to 4 with the mesh 
{0, 1, 2, 3, 4}. Determine the area exactly by integration. Also plot 
this function and the approximation given by the piecewise linear 
function in (6). 

See how accurate your answer gets with a denser mesh. Compute 
with the trapezoidal rule again with the mesh {0, .5, 1, 1.5, 2, 2.5, 
Sasa, Se. 


Use the trapezoidal rule, equation (8), to estimate the area under 

the curve f(x) = e* from 0 to 4 with the mesh {0, 1, 2, 3, 4}. 

Determine the area exactly by integration. Also plot this function 

and the approximation given by the piecewise linear function 

in (6). 

See how accurate your answer gets with a denser mesh. Compute 

with the trapezoidal rule again with meshes. 

(O) co, 1 SS 2 2S, 3352 At 

(MOOS FS ade heS. 4S bO5n2, 2.25, 2.5, 2:75,..3; 
S29; 3.55 3 1 eat 


Use the trapezoidal rule, equation (8), to estimate the area under 

the curve f(x) = 10/(x + 1) from 0 to 4 with the mesh 

10, 1, 2, 3, 4}. Determine the area exactly by integration. Also plot 

this function and the approximation given by the piecewise linear 

function in (6). 

See how accurate your answer gets with a denser mesh. Compute 

with the trapezoidal rule again with meshes: 

(i), 403 55; 11 SA2, 2-5. ee 

(it) {0,. .25,. 35.1 MEO sy) oe oe coe 
3:25; 3.5, 3,70, 4} 


Use the trapezoidal rule, equation (8), to estimate the area un- 
der the curve f(x) = sin (27/x) from ¢ to 1 with the mesh 
{.25, .5, .75, 1}. Also plot this function and the approximation 
given by the piecewise linear function in (6). This function cannot 
be integrated by any standard integration technique. 

See how much your answer changes with a denser mesh. Compute 
with the trapezoidal rule with the variable mesh {.25, .26, .27, .28, 
DDS Ded bg sty, os tte ay SAR os, Dai ahs ete Le 


9. Simpson’s rule approximates an integral by 


| 


h 
f(x) dx = ; {f(a) + 4f(t,) + 2f(t)s) + 4f(t,) + 2f(t,) + °° > 
+ 4f(t,-1) + f(d)} 


where tf; = a + Ai, i = 1, 2, ...,n — 1, n even, andh = 
(b — a)/n. 
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(a) Simpson’s rule finds the exact integral of quadratic functions. Ver- 
ify this by using Simpson’s rule to approximate the given integrals 
and check by formal integration. 

(i) f(x) = x? + 1 from 0 to 1 with n = 4. 
(ii) f(x) = x* — 2x + 4 from 1 to 3 with n = 2. 

(b) Repeat Exercise 5, part (a) with Simpson’s rule. Is the Simpson’s 
rule approximation more accurate? 

(c) Repeat Exercise 6, part (a) with Simpson’s rule. Is the Simpson’s 
rule approximation more accurate? 

(d) Repeat Exercise 7, part (a) with Simpson’s rule. Is the Simpson’s 
rule approximation more accurate? 


10. Repeat the method for solving d*y/dx? = — f(x), 0 = x = 1 approx- 
imately in Example 4 with y(0) = y(1) = O using the following f(x). 


(a) f(x) = {f.; = —10,000 and all other f, = 0} 
(b) f(x) = if, = —10,000 and all other f; = 0} 
11. Use the method for solving a differential equation in Example 4 to solve 
approximately 
d*y 
a2 ~~ 10,000, Oe x | 


with h = .01 and y(0) = y(1) = 0. 


> Computing Cubic 
Spline Approximations 


In this appendix we show how the equations for determining the coefficients 
in a cubic spline can be greatly simplified and reduced to a tridiagonal system 
of n — | equations in n — 1 unknowns, where n is the number of sub- 
intervals in the cubic spline. 

Suppose that we divide the interval [a, b], into n subintervals. Let 
a= ti<t,<++**<t, = bbe the mesh points. Recall that a cubic spline 
s(x) for a function f(x) 1s a piecewise cubic function such that 


(a) s(t;) equals the function f(t;) at each f,. 
(b) The first and second derivatives s'(x) and s”(x) are continuous. 


We only know the values of f(x) at the mesh points f), f,, . . . , 4, and want 
to interpolate values for f(x) over the whole interval. 

Let s,(x) be the cubic polynomial in subinterval [¢;,, ¢;,,], 2 = 
O,1,...,n — 1. Then condition (a) becomes 


sit) = ste), ESD ES cs et | (1) 
S(t.) = fl64 3 Ve ett = | (2) 
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and condition (b) becomes 


S(t) = S24 teh PHO ili. apt QZ (3) 
Sie ye = Sa cake eine ‘EL | Dies aR | ey (4) 


This gives us 4n — 2 equations in 4n unknowns. As in the discrete approx- 
imation of a differential equation in Example 4 of Section 4.7, we need two 
extra constraints. The ones that work out the best are 


Soto) = 0 and So he) =O (5) 


Let us assume that the mesh points are equally spaced with f;,, — 
t. = h. The first step is to express s,(x) in the translated form 


s(x) = a; + B(x —‘t,) + efx. -— ty (6) 
+ d(x — t,)’, a) eee 


As a result of (6), when x = ¢,, then s{t;) = a,, s;(t;) = 6; and 
s(t;) = 2c;. Observe also that 


S(tj4;) = a, + bh + ch? + dh 
S;(t;.44) = Bb, + 2¢;h. 4+ 3d;h? (7) 
Si(ti414) = 2c; + 6d;h 


Using (7), we rewrite conditions (1)-(5) as 


a; = fi(t;), Be Ne de Ge Te Ce 
din, = 4, + BA + Gh +f’, ?f = Velweawe r= 1 
b.., = b, + 2¢,h + 3d,h’, P= O24 geese SS 
2C;4, = 2c; + 6d,h or 

Cray = 6, * Jan, b= Di gent =e 
Co = Oandc, = 0 (5') 


Here we have ‘‘invented’’ a,[=f(¢,)] in (1’) and c, (= 0) in (5’). 
There is no s,(x) but the terms a, and c, are a useful way to represent 
conditions on the spline at ¢,,. 

From (1") we see that the a,’s can be determined immediately from the 
values f(t;). We shall now proceed to show how to express the d,’s and b,’s 
in terms of the c,’s and the a,’s, and then we obtain a tridiagonal system to 
solve for the c;’s. 

Solving (4') for d;, we have 


oo a ae he: (8) 
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Substituting this value for d; in (2') and (3’), we obtain 


h2 
aii) = a, + hb, + = Qc, + ci), Cs Oe os y= 2 a2 
b..., = b+ hc, ; +c), Pe ee. ch 2 EF) 
Solving (2”) for b;, we obtain 


17 G@ MZ; + C, - 
po Se ot ei; Doc, ic ae Ski of) 


Reducing the subscript by 1, (9) becomes 


a: ae h(2ec._, + «¢ 
p,_, = Aes Ae ‘Sy Oy Sr (9) 


Let us also rewrite (3”) with reduced subscript 
D, = B_, * Ric; FC:_4), 3S le RE re ee a (3) 
Now we substitute (9) for b; and (9') for b;_ , in (3) to obtain 


Qj4;—- 4 hQc; + 1c.) a =a 
h 3 h (10) 
“ n(Ze._, + €;) 


3 


r=J 


+ IUCr Es) 


Multiplying by 3, dividing by h, and collecting the a,’s on the right side, 
we have 


3a. — 7. 
Qe; Csegh~ Re + FG) + Me. + eephes we A 
3(a; — aj_)) 
idiet . Aco 
Or 
3 . 
Ga th 4G F Cia = [ae i, 26, Pe hs Ss ya om I 


h? 
(11) 


Recall from (1’) that a, = f(t,;). Once we solve (11) for the c,’s, then 
by (8) we can determine the d;’s and by (9) we can determine the b,’s. Thus, 
to determine the 4n coefficients in the n cubic polynomials of our cubic 
spline, we only need to solve the (n — 1)-by-(n — 1) tridiagonal system 
(11) (recall that cp and c,, equal 0). 
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Note that it would have been possible to develop the foregoing equa- 
tions without equally spaced subintervals; unequal spacing is natural for 
curves that are straight most of the time but change rapidly over few short 
intervals. If h; = t;,, — t,, the tridiagonal system (11) can be shown to be 


3 
h;_,c; + 2(h;_; * Ae; + Aj€;4, = ® (a;,; — 4;) , 
i (12) 


Example 1. Cubic Spline Approximation 

of x - sin(7x) 
Let us use a cubic spline to approximate the function f(x) = x* sin(7x) 
over the interval |1, 3]. We use the values of f(x) at nine mesh points: 


l. b2o, £3, 1.73, 20, 2,25, 2.55 2.79, 3.0.(2 = .025); yielding 
eight subintervals. We have the following table of values at these 


points. 

Mesh Point Function Value 

t f(x) = x + sin(mx) 
tp = 1.0 f(t.) = 0 
t, = 1.25 f(t,) = —.884 
t, = 1.5 ft) = =—1.5 
t, = 1.75 f(t,) = —1.237 
t, = 2.0 f(t,) = 0 (13) 
t, = 2.25 f(ts) = 1.591 
te = 2.5 ft) = 2.5 
t, = 2.75 f(t) = 1.945 
t, = 3.0 f(tg) = 0 


By (1'), a; = f(t;), so the second column of (13) gives the values of 
the a,'s. Next we write the system of seven equations in seven un- 
knowns given by equation (11) (note that 3/h? = 3/.257 = 48). 


4c, + ¢ = 48(0 + 2- .884 — 1.5) = 12.864 
¢, + de, +e; = 48(—.884 + 2-1.5 — 1.237) = 42.192 
co * dey +e; = 48(-1.5 + 2-1.237+0) = 46.752 

cy + 4e, $8. & = 48(—1.237 —- 2-0 + 1.591) = 16.992 

cy Fide, + & = 48(0 — 2- 1.591 + 2.5) = —32.736 

cs + 4c, + cp = 48(1.591 — 2-2.5 + 1.945) = —70.272 


ce + 4c, = 48(2.5 — 2+ 1.945 + 0) = —66,720 


(14) 
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Solving (14) by Gaussian elimination, we obtain 


c, = 1.204, c> = 8.048, c,; = 8.797, c, = 3.520, 
c; = —5.883,  c, = —12.722, c, = —13.499 


Using (8) to determine the d,’s and (9) to determine the b,’s, we 
obtain the cubic polynomials s,(x): 


sax) = 0 — 3.636(x — 1) + O(x- 1) + 1.605(x -— 1)P 

s(x) = —.884 — 3.335 — 1.25) + 1.204% — 1.25% + 9.125(x — 1.25) 
s(x) = —-1.5 — 1.022% — 1.5) + 8.048@ —-— 1.5) + .999@ — 1.5) 
s,(x) = —1.237 + 3.189(% — 1.75) + 8.797(x — 1.75)? — 7.036(x — 1.75) 
s(x) = O + 6.268(x — 2.0) + 3.520(x — 2.0)? — 12.537(x — 2.0) 
s(x) = 1.591 + 5.677(x — 2.25) — 5.883(% — 2.25% — 9.119G@ — 2.25) 
s(x) = 2.5. + 1.025(@@ — 2.5) =— 12.7224 — 2.5% — 1.036(x — 2.5/ 
sxx) = 1.945 — 5.530(x — 2.75) — 13.499(x — 2.75)* + 17.999(x — 2.75) 


Finally, we give a table comparing the values of the spline approxi- 
mation s(x) with the original function f(x) = x - sin(tx). 


x f(x) s(x) 
1.0 0 0 
1.1 — .340 — 362 
1.2 — .105 —./14 
3 — 1.052 — 1.046 
1.4 — 1.331 — 1.326 
ee. —].5 —1;5 
1.6 — 1.522 — 1.521 
a =f). 375 =— 1.3/5 
1.8 = 1.058 — 1.056 
2.0 0 0 (15) 
PD | .649 649 
Y Aad 1.293 1.294 
2.3 1.86] 1.859 
2.4 2.282 2.219 
YES cd ZS 
2.6 2.473 2.474 
2.7 2.184 2.188 
2.8 1.646 1.637 
2.9 .896 873 
3.0 0 0 


A very good fit. i 
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There are other ways to obtain a tridiagonal system for determining 


the coefficients in a cubic spline; see Cheney and Kincaid (mentioned in the 
References) for a method starting with the second derivative of s(x) and 
integrating twice. 


Section 4.7 Appendix Exercises 


Summary of Exercises 
These exercises. require the construction of cubic splines to approximate 
various functions. The reader should mimic the steps in Example 1. 


1. 


Repeat the calculations in Example | but now use the interval [3, 4] with 
mesh points 3, 3.25, 3.5, 3.75, 4. Compare the values of your approx- 
imation with the true values of f(x) at x = 3.1 and 3.9. 


. (a) Use a cubic spline to approximate the function f(x) = sin (7x) over 


the interval [0, 1] with mesh points 0, .25, .5, .75, 1. Compare the 
values of your approximation with the true values of f(x) atx = .1 
and .65. 

(b) Use your cubic spline to approximate the integral of sin (tx) from 
0 to 1. Note the exact answer is 2/r7. 


. Repeat Exercise 2 using mesh points 0, .2, .4, .6, .8, 1. 


. (a) Use a cubic spline to approximate the function f(x) = e * over the 


interval [0, 4] with mesh points 0, 1, 2, 3, 4. Compare the values 
of your approximation with the true values of f(x) at x = .5 and 
Fes 

(b) Use your cubic spline to approximate the integral of e * from 0 to 
4. Integrate to compute the exact answer. 


. (a) Use a cubic spline to approximate the function f(x) = log.x over 


the interval [.5, 1.5] with mesh points .5, .75, 1, 1.25, 1.5. 
(b) Use your cubic spline to approximate the integral of log,x from .5 
to 1.5. Integrate by parts to determine the exact answer. 
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Interlude: Abstract 


Linear Transformations 
and Vector Spaces 


In this interlude, we take a step back from matrix algebra and give a more 
general setting to the linear problems addressed in this book. By “‘linear 
problems’’ we mean problems involving equations consisting of linear com- 
binations of variables. These problems have come in two general forms: 
(i) solving a system of linear equations Ax = b, and (ii) describing the 
behavior of iterative models of the form p’ = Ap. 

In Section 4.1 we used the term /inear transformation to refer to map- 
pings 7: w— w = 7(w), where w = Aw. In that section we treated the 
system w’ = Aw as defining a computer graphics transformation 7(w) that 
might be applied to a stick-figure drawing or possibly the whole x-y plane 
(or x-y-z space). The key property of linear transformations is (Theorem | 
of Section 4.1) 


T(aw + bv) = aT(w) + bT(v) (1) 


Property (1) led to the observation that linear transformations take lines into 
lines. 

Property (1) turns out to be at the heart of all linear models. If a 
transformation 7(w) satisfies (1), where w is an n-vector, then 7(w) must 
actually be a matrix transformation: 7(w) = Aw. That is, for vectors, prop- 
erty (1) defines a matrix-vector product. | 

We define an abstract linear transformation on n-vectors to be any 


mapping w’ = 7(w) of an n-vector w to an m-vector w’ such that T satis- 

fies (1). ; 

Theorem I. Any abstract linear transformation w’ = T(w) can be repre- 
sented by matrix multiplication: w’ = Aw. 


Proof: If w is an n-vector and w’ is an m-vector, then A will have to 
be an m-by-n matrix. When transforming a set of points in Section 
4.1, we used property (1) to simplify the calculations by expressing 
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all points as linear combinations of a few key points. In this proof we 
reason the same way. Here the key points will be unit vectors, such 
as (1, 0, O, ..., 0). Let e; be the jth unit vector in n dimensions 
(with a | in the jth position and 0’s in the other n — | positions). 

We define the jth column af of A to be 7(e;), the vector to which 
e, is mapped by 7: 


a; = T(e) (2) 
If w = (w,, W>,..., W,), We Can write w as 
w=wee, + woe+°°* + we, (3) 
Then by (1) and (2) 
I(w) = w,l(e,) + walle.) + ---> + w,f(e,) (4a) 
=way +wasS +:-:+ + w,ar (4b) 


= Aw 


The linear combination in (4b) of the columns of A is exactly the 
definition of Aw. a 


The preceding proof shows that any linear transformation is specified 
by knowing what it does with a set of coordinate vectors. In the proof we 
used the unit vectors e;, but any set of vectors whose linear combinations 
yield all other vectors would work. 

In Section 2.5 we saw that for square matrices, eigenvector coordinates 
made matrix multiplication very easy. If u; is an eigenvector of 7, then 
7(u;) = A;u;. In the proof above, if w has the representation in eigenvector 
coordinates of 


W.= 10, +t + es Fe (5) 
then (4a) and (4b) become 


T(w) <= r,T(u,) 2 r,T(u,) ES r,,1(u,,) (6) 
A,r)u, + A5I5U, + rh + A, r,,U,, 


vn 


For example, the rabbit-fox growth model (Example 5 of Section 3.1) 


Bl =43 
' = Ap, here A = 7 
Etarsraaell bate | a “ 
has eigenvectors u, = [3, 2] and u, = [1, 1] with associated eigenvalues 


A, = | and A, = .95. So if the initial population p = [R, F] were written 
in eigenvector coordinates as p = |[s,, s,] (= s,u, + s,u,), then the linear 
transformation T given by A becomes 
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T(p) = s,T(u,) + ssT(u) or 


[allo ssila}-[sn) 
S5 PDs ISS, 


5,(1lu,) + s5(.95u,) 


The concept of a linear transformation helps to remind us that (7) and 
(8) are the same thing—that is, are the same linear transformation—but 
expressed in different coordinate systems. 

The lesson from Section 2.5 is that, when possible, linear transfor- 
mations should be expressed in eigenvector coordinates. To convert to ei- 
genvector coordinates and afterwards to convert back to standard coordi- 
nates, we can use the matrix equation (Theorem 5 of Section 3.3) 


A = UD,U~' (9) 


The concept of a linear transformation also gives new understanding 
to the problem of solving a system of equations. Consider our refinery prob- 
lem: 


Ax = Db: Heating oil: 20x, + 4x%, + 4x, = 500 


Diesel oil: 10x, + 14x, + 5x; = 850 (10) 
Gasoline: DX, + Sx, + 12x, = 1000 
Viewing (10) as a linear transformation problem, 
T(x) = b (11) 


we see that solving (11) for a vector x of production levels is asking for a 
vector x in the domain of 7 that is mapped by T to b. This is a vector- 
valued version of the problem: Given a function f(x) and a constant b, find 
an x for which f(x) = b. 

Just like the function version of this problem, if b is in the range of 
T, there will be at least one solution; if T is a one-to-one mapping, there 
will be at most one solution; otherwise, there may be many solutions. 

Linear transformations provide a convenient way to abstract matrix 
problems. However, matrix problems are only the beginning. Linear trans- 
formations can be defined to act on more complicated sets than vectors, such 
as functions. 

Functions can be thought of as an infinite dimensional extension of 
vectors. An abstract vector space is any collection C of elements that obey 
the law of linearity. That is, if A and B are elements of C, then rA + sB 
are in C, for any constants r, s. The set of all continuous functions forms 
an abstract vector space. The same is true for the set of all functions, con- 
tinuous or not. 
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ee 
Example 1. Linear Transformations on Functions 


(i) Shift transformation S,. Let S,(f(x)) = f(x + a). That is, Ss 
shifts the values of a function f(x) 5 units to the left: 


if f(x) = x? — 2x, then S.(f(x)) = (x + 5)? — 2(x + 5) 


(11) Reflection transformation R. Let R(fx)) = f(—x). So R reflects 
the graph of any function about the y-axis. 

(iii) Differentiation transformation D. Let D(f(x)) = df(x)/dx, the 
derivative of f(x) (assuming that the derivative exists). 

(iv) Integration transformation I. Let I(f(x)) = J f(x) dx, the integral 
of f(x) (integration actually requires a constant term; here we will 
assume that the constant is QO). Wi 


It is left to the reader to check that the transformations in Example 1, 
parts (i) and (11) are indeed linear transformations. The required property, 
generalizing (1), Is 


T(af(x) + bge(x)) = aT(f(x)) + bT(g(x)) for any constants a,b (12) 
Because differentiation is so important, let us check that it is a linear trans- 
formation. Property (12) is 

eP = tafe) + bgQ)] = a7 ~ fx) + b= 7 8) (13) 


But (13) is the linearity rule of derivatives. Similarly, for integration we 
have 


J laf(x) + bg(x)| dx = af fix) dx + bf g(x) dx (14) 


Virtually all of the theory for analyzing matrix equations extends to 
linear transformations of functions. For example, we can talk about inverse 
transformations and about eigenfunctions u(x): T(u(x)) = Au(x). 


Example 2. Inverse Transformations of 
Linear Transformations 


(i) For the shift transformation S,(f(x)) = f(x + a), the in- 

verse transformation S~' is S_, since S_,[S(f(x))] = 
f(x + a — a) = f@). 

(11) For the reflection transformation R(f(x)) = f(—.x), the inverse 
R~' is simply the reflection transformation itself. So R~' = R. 

(iii) For differentiation, there is no (unique) inverse D~'. If two func- 
tions differ by a constant, say, x* + x + 2 and x? + x + 5, 
they have the same derivative, 2x + 1. SoD~'(2x + 1) cannot 
be uniquely defined. 

(iv) For integration, the inverse /~' is differentiation. That is, 
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d 
rs [J fx) dx] = fi) 
2 a 


This is the fundamental theorem of calculus! & 


Example 3. Eigenfunctions of 
Linear Transformations 


(i) For the shift transformation S_(f(x)) = f(x + a), an eigenfunc- 
tion u(x) must have the property that u(x + a) = u(x). For 
example, when a = 2m, the trigonometric functions, such as 
sin x or cos x, are eigenfunctions of S,_, with eigenvalue 1. 

(ii) For the reflection transformation R(f(x)) = f(— x), an eigen- 
function u(x) associated with A = | is any symmetric u(x), that 
is, u(x) = u(—x). 

(iii) For differentiation, e** is the eigenfunction of D associated with 
eigenvalue AX = k, since de**/dx = ke*. 

(iv) For integration e*/A is the eigenfunction of J associated with 
eigenvalue A = k, since integration is the reverse operation of 
differentiation. a 


There is one very important generalization of differentiation that bears 
special mention, namely differential equations. The differential equation 


y'(x) — 2y'(x) = f(x) (15) 


can be considered a linear transformation DE of y(x) to f(x), that is, 
DE( y(x)) = f(x). It is left to the reader to check that DE satisfies property 
(1). Any differential equation whose left side is a linear combination of 
derivatives will be a linear transformation. The advanced theory of differ- 
ential equations is based heavily on eigenfunctions and inverse transforma- 
tions. 

For linear transformations defined in terms of matrices, we noted that 
the *‘right’’ coordinates for describing the transformation are eigenvector 
coordinates. The same applies to linear transformations of functions. Func- 
tions should be expressed as linear combinations (infinite series) of eigen- 
functions of the linear transformation. 

This book is not about linear transformations of functions. It is about 
matrices and vectors. But it is important to be aware of the powerful gen- 
eralizations of matrices and matrix algebra which are the basis for much of 
higher mathematics. If the reader masters the matrix-based linear algebra in 
this book, he or she will have an excellent foundation for any future work 
with functional linear algebra. 

The purpose of this interlude has been to implant the seed in the read- 
er’s mind that many operations on functions are linear transformations and 
that most of the theory of matrices extends to these linear transformations. 
In Chapter 5, occasional examples using linear transformations of functions 
are given. 
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Theory of Systems of 


Linear Equations and 


Eigenvalue /Eigenvector 
Problems 


Null Space and Range of 
a Matrix 


The null space Null(A) of a matrix A is the set of vectors x that are solutions 
to the system of equations Ax = 0. The range Range(A) of A is the set of 
vectors b such that Ax = b has a solution. Another name that is sometimes 
used for Null(A) is the kernel of A. In this section we examine these two 
important sets of vectors associated with any matrix. We look at their role 
in linear models and learn how to determine these sets. 

Both the Null(A) and Range(A) are vector spaces. 


Definition. A vector space is any set V of vectors such that if x,, x» are 
in V, then any linear combination rx, + sx, is also in V. 


If Ax, = 0 and Ax, = O, then we have 
A(rx, + sx,) = r(Ax,) + s(Ax,) = r(0) + s(0) = 0 


Thus Null(A) is a vector space. A similarly simple proof, left as an 
exercise, shows that Range(A) is a vector space. 

Suppose that A is an n-by-n matrix for which the system Ax = b has 
a unique solution for every b. Then Range(A) is all possible n-vectors, and 
Null(A) is just the zero vector 0, since 0 is always a solution to Ax = 0 
and by assumption there can only be one solution to Ax = 0. 
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(a) (b) 


Figure 5.1 (a) Two lines intersect at a point. (b) Two parallel lines have no inter- 
section points. | 


In this section we are concerned with matrices A for which Ax = b 
has nonunique solutions or no solutions. Then determining the range and 
null space of A becomes important. 

Let us briefly describe the geometry of systems of equations that do 
not have unique solutions. Consider the pair of equations 


Lh = ee (1) 
3x + 2y = ll 


In Figure 5.la we have plotted the graph of the two equations in (1) in 

x — y space. The graph of a linear equation is a straight line. An (x, y) pair 

that solves (1) must lie on the lines of both equations. That is, the (x, y) 

pair must be the coordinates of the point where the two lines intersect. From 

Figure 5.la we see that this intersection point has coordinates (3, 1). 
Suppose the two equations produce lines that are parallel: 


2x - y=5 (2) 
4x —2y=4 


(see Figure 5.1b). Then there is no common point—no solution to (2). Note 
that “‘parallel’’ means that the second equation’s coefficients are multiples 
of the first’s. We saw in the canoe-with-sail example in Section 1.1 that 
when two equations produce almost parallel lines, the equations can give 
strange results. 


A system of three equations in three unknowns, such as 


2x+5y+4z= 4 
x + 4y + 32 
Sx ckiapeh 2iees5 


—= 


(3) 


Sec. 5.1 Null Space and Range of a Matrix 395 


Intersection lines of two planes 


(a) (b) 


Figure 5.2 (a) Three planes intersect y 
at a point. (b) The lines formed by in- 
tersections of two planes are parallel. 

(c) Three planes intersect along a line. 


(c) 


has a similar interpretation in three-dimensional space (see Figure 5.2a). 
Each equation now determines a plane of points. A solution of (3) will be 
the coordinates of a point where all three planes intersect. If two of the 
planes are parallel, no solution can exist. 

Another possibility is that while each pair of equations intersects to 
form a line, there may be no point common to all three. This happens if the 
line formed by the intersection of two planes is parallel to the third plane. 
Figure 5.2b illustrates this situation; such a system of equations is 


YK yp 2s 
tr PRS ss (4) 
A SY HZ 


The points (x, y, z) satisfying both of the first two equations in (4) form the 
line x + z = 3, y = O; the points satisfying the first and third equations 
form the line x + z = 4, y = 1; and the points satisfying the second and 
third equations form the line x + z = 1, y = 2. 
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Of course, when we have four or more equations in three unknowns, 
it is very likely that there will be no (x, y, z) that lies on all four planes 
(satisfies all four equations). Conversely, if we had only the first two equa- 
tions in (4), all the points on the line x + z = 3, y = O mentioned above 
will be solutions. 

Let us turn from systems with no solutions to 3-by-3 systems with 
multiple solutions. The points that lie on all three planes may happen to 
form a line, as in the case for the system 


4x- yt2z= 3B 
xX Dy SH (5) 
24 Foy = 7 = 4 


which has the graph shown in Figure 5.2c. The line of points (x, y, z) 
common to all three equations in (5) is 2x + z = 4, y = OQ. So (5) has an 
infinite number of solutions. As we go to higher dimensions, the possibilities 
for multiple solutions or no solution increase. 

We present two theorems that show the fundamental link between 
multiple solutions to the system Ax = b and the null space of A. 


Theorem I. Let A be any m-by-n matrix. 
(i) If Null(A) contains one nonzero vector x°, then Null(A) contains 

an infinite number of vectors; in particular, any multiple rx° is in 
Null(A). 

(ii) If x° is in Null(A) and x* is a solution to Ax = b, then x* + 
x’ is also a solution to Ax = b. 

(ii) If x,, x, are two different solutions to Ax = b, for some given 
b, then their difference x, — x, is a vector in Null(A). 

(iv) Given a solution x* to Ax = b, then any other solution x’ to this 
matrix equation can be written as 


eo Sy spe? for some x° in Null(A). 


(v) If Null(A) consists of only the zero vector 0 (i.e., Ax = 0 has 
only the solution x = 0), then Ax = b has at most one solution, 
for any given b. 


Proof 
(i) A(rx°) = r(Ax°) = r0 = 0, so rx® is in Null(A) for any scalar 
r. Thus Null(A) is infinite. 
(ii) Let x® be in Null(A), so Ax? = 0. Since Ax* = b, then 


A(x* + x°) 


Ax* + A(x”) (6) 
=b +90=D 


Thus x* + x® is a solution to Ax 
(111) Since A(x, — x,) = Ax, — Ax, 
is in Null(A). 


b, as claimed. 
b — b = 0, then x, — x, 
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f 


(iv) Given solutions x* and x', choose x? = x’ — x*. Then x’ = 
x* + x°, and by part (iii), x° is in Null(A). 

(v) Suppose that one solution x* to Ax = b is known. By part (iv), 
all solutions x' of Ax = b can be expressed in the form x’ = 
x* + x°, where x° is in Null(A). If Null(A) consists of just 0, 
then x’ = x* + Q—there is only solution, x*,to Ax = b. &@ 


Theorem 2. Let A be an m-by-n matrix. If Ax = b’ has two solutions for 
some particular b’: 
(1) The null space of A has an infinite number of vectors. 
(11) For any b, either Ax = b has no solution or an infinite number 
of solutions. 


Proof 

(1) By Theorem 1, part (ii), the difference of two solutions is a 
(nonzero) vector x° in Null(A), and then by Theorem 1, part (i), 
the multiples rx° yield an infinite number of vectors in Null(A). 

(11) Suppose that Ax = b has one solution x*. From Theorem 1, part 
(ii), x* + x° is also a solution, for any x? in Null(A). By part (i) 
of this theorem, Null(A) is infinite. i 


Theorem 2’s result that two solutions lead to an infinite number of 
solutions corresponds to our geometric pictures in which multiple solutions 
always consisted of an (infinite) set of points along a line. 

A system of equations with 0 on the right side—Ax = 0—is called a 
homogeneous system. Solutions to the homogeneous system Ax = 0 form 
the null space of A. One often speaks of the null space of the system 
Ax = 0, implicitly meaning the null space of A. 

Homogeneous systems Ax = 0 have arisen in several different settings 
in this book. 


Oil Refinery Problem 


Let us suppose that in our familiar oil refinery problem, the three 
refineries produce only the first two products: 


Heating oil: 20x, + 4x, + 4x, = 500 (7) 
Diesel oil: 10x, + 14x, + 5x, = 850 
Suppose that we are given one solution, x, = 15, x, = 50, 


x, = O, using just the first two refineries. We want to find another 
solution with x, = 20. Let us find the null space of this coefficient 
matrix and then, using Theorem 1, part (iv), add an appropriate null 
space vector to the given solution [15, 50, 0] to get a solution with 
x, = 20. 

The null space for this system of equations is all solutions to the 
associated homogeneous system 
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10x, + 14x, + 5x, = 0 


Solving with elimination by pivoting, we obtain 


Xi + #x,'=0 


Xs. + ax, =) 


Thus x, = —30x3 and x, = —4x;. So vectors in the null space have 
the form 
3 1 3 1 
[—20%3, —4X3, X3] Or Xs — 36s. — 2 (9) 


Theorem 1, part (iv), says that any solution x’ to the system (7) 
can be expressed in the form x’ = x + x°, where x” is a null-space 
vector, in this case x° = r[—#5, —4, 1] for some constant r. We said 
above that we want x’ to have x, = 20. Then 


x =x + x” (10) 
[x}, x3, 20] = [15, 50, 0] + r[—30, —4, 1] 


Matching the third entry on each side of (10), we have 20 = 0 + r. 
So r = 20 and the desired solution is 


x’ = x + x° = [15, 50, 0] + 20[-—35, —4, 1] 
= [12, 45, 20] (11) @ 


Balancing Chemical 
Equations Revisited 


Example 2. 


In Example | of Section 4.3 we obtained a system of equations for 
balancing the atomic equations for the chemical reaction in which 
permanganate (MnQ,) and hydrogen (H) ions combine to form man- 
ganese (Mn) and water (H,O): 


MnO, + H— Mn + HO (12) 


where H represents hydrogen and O oxygen. We let x, be the number 
of permanganate ions, x, the number of hydrogen ions, x, the number 
of manganese atoms, and x, the number of water molecules. To have 
the same number of atoms in the molecules on each side of the reaction, 
we obtained the system of equations 


H: X, = 2x, 
Mn: X, = X; (13a) 
QO: 4x, =X 
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OT 
X5 = 2X, = Q 
4x, —-x, = 0 


Notice that we have four unknowns but only three equations. When 
we solved system (13b) using elimination by pivoting [pivoting on 
entries (2, 1), (1, 2), (3, 3)], we obtained 


X>5 _— 2X4 = 0 X> — 2X4 
XxX) =~ 2x) = 0 Or XxX) = 3x, ( 14) 
1 ~~ a 
X3 ad 4X4 — 0 X3 _— 4X4 


As vectors, the solutions in (14) have the form 
(205, 2Xes\ tXes Mali oe OE ee gle. 2; 301] (15) 


These vectors form the null space for the system (13b). 
For example, if x, = 4, then x, = 8, x, = x, = |, and the 
reaction equation becomes 


MnO, + 8H—> Mn + 4H,O 


The solution we obtain makes the amounts of the first three types 
of molecules fixed ratios of the amount of the fourth type, which we 
are free to give any value (1.e., x, is a free variable). In another series 
of pivots, the final free variable might end up being x,. 


aa, = Xz = 0) OT X4 — Xx (16) 
—4x, = X4 = 0 X4 = 4x, 


yielding solution vectors 
[x,, 8x), X7, 4x,] or x,[1, 8, 1, 4] (17) 


However we formulate the solution of (13b), we always have the same 
set of vectors, namely, the null space of the coefficient matrix in (13b). 
For example, the vector [1, 8, 1, 4] in (17) is a multiple of the vector 
iz, 2, 25 1] im (15). a 


Examples | and 2 show us how to determine the null space of any 
matrix. We simply apply elimination by pivoting to the homogeneous system 
Ax = 0 and reduce it to the form (14), from which we obtain the null-space 
vectors as expressed in (15). 
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Example 3. The Null Space of the 
Frog Markov Chain 


We have already performed elimination by pivoting on this transition 
matrix A in Example 6 of Section 3.3 when we were trying to invert 
the matrix. We do the pivoting again, but now the right-side vector 
is 0. 


S 
= 
eo © G2 & © 


Pivoting on entries (1, 1), (2, 2), (3, 3), (4, 4), and (5, 5), we obtain 


l 0 0 0 0 1 |] p, 0 
0 l 0 0 Y “=21 Bs 0 
0 0 l 0) 0 2|| p3| _ | 0 
immty et oo Gs tet To 
0 0 0 0 l 2\| Ds 0 
0 0 0 0 0 OIL De 0 (18) 
Py pe = 9 
Pp. — 2p, = 9 
or p3; + 2p, = 0 
Pa — 2p, = 0 
ps + 2p, = 0 
Equations (18) express the first five p,’s in terms of p<. Rewriting (18), 
we get 
Pi = —Po P2 = 2Po Ps = —2P5 Ps = 2Pe Ps = —2P6 


and thus the solutions to Ap = 0 have the form 


[Per és “Per PG 2a, Pal OT Pal. 25 M22, 21] 
(19) 


The vectors in (19) are the null space of A. 

We can add a null-space vector p® like (19) to a probability 
vector p, and if Ap = p’, then also A(p + p°) = p’ [this fact is 
Theorem 1, part (ii)]. For example, we found earlier that p* = 
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[.1, .2, .2, .2, .2, .1] is the stable distribution for the frog Markov 
chain, so that Ap* = p*. So for any null-space vector p®, we have 
A(p* + p°) = Ap* = p*. Suppose that we select for the null-space 
vector p? = [—.1, .2, —.2, .2, —.2, .1] [with pg = .1 in (19)]. 
Then 


Ste ee Td de md, Dy, = Bcd 
= [0, .4, 0, .4, .2] 


Thus if we start with distribution [0, .4, 0, .4, 0, .2], we will reach 
the stable distribution p* after just one period. (The reader should 
verify this result numerically.) & 


tie a 
Example 4. A Two-Variable Null Space 


The following system of equations formed constraints in the transpor- 
tation problem presented in Section 4.6; the names of the variables 
have been changed for simplicity. (Background: The first equation 
represents the fact that the amount x, of food shipped from the first 
warehouse to college A plus the amount x, shipped from the first ware- 
house to college B equals 20, the amount of food in the first warehouse. 
The other equations represent the amounts available at the second and 
third warehouses and the amounts needed at colleges A and B. There 
are many solutions—ways to ship the food between the three ware- 
houses and the two colleges. In Section 4.6 each x; had an associated 
cost and we sought to minimize the total cost.) 


1 = 20 
xX; ity = 30 

xX, +X, = 15 (20) 
Xx; ee + Xs = 25 
X5 te Bs + x, = 40 


To change from one solution x* of (20) to another (cheaper) solution 
x**, we would add some null-space vector to x* according to Theo- 
rem |. To find the null space, we solve the associated homogeneous 
system 


Xx, + Xp = 
Xs 3 24 = 
Xs + X = (21) 


x, + X +: .Xs = 


Oo © CO © © 


x5 oT By +X, = 


Pivoting on entries (1, 2), (2, 4), (3, 6), (4, 3), we obtain 
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yet; De = 0 
My =-xX 
“Ay Sao Ke 0 ° 
X= Mor Xs 
Xx, +x =0 or 
%6 = ~ oe 
x; + X3 +. Xe = 0 
i =X, =F 
0 2‘ 3 5 
(22) 


The solutions in (22) to the homogeneous system (21) produce the set 
of null-space vectors 


[Xj5 Pet ey ike; Aaah ag; — Hs] (23) 
Breaking the x, and x; components apart, we can rewrite (23) as 
AY, —1, —i, £80) Fa = FH (24) 
So the null space is all linear combinations of the two vectors 
x, = (1-1; S17, 0 OF * and ““xt*="0; 0, —1, 1, 1, —IJ]. 


Suppose that we are given the solution to (20) x: x, = 20, 
x, = 5,x, = 25, x, = 15, and x, = x; = 0. Thus 


x = [20, 0, 5, 25, 0, 15] 


Also suppose that we want a solution in which x, = 10 and 
x; = 5. We can achieve this by adding the right linear combination 
of null-space vectors xj} = [l, —1, —1, 1, 0, 0] and xt = 
[O, 0, —1, 1, 1, —1]. Remember that adding any null-space vector 
to a solution vector yields another solution vector. 

To make x, = 10 (it is now 0), we can add to x the vector 
— 10x} = [—10, 10, 10, —10, 0, 0]. To make x, = 5 (also now 0), 
we can add to x the vector 5x= = [0, 0, —5, 5,5, —5] to x. So our 
desired solution is 


x — 10x* + 5x* = [20, 0,5, 25, 0, 15] 
+ [-10, 10, 10, —10, 0, 0] 
+ {0;0, =5, 5) 5, —5) 
= [10, 10, 10, 20, 5, 10] a 


The null space in Example 4 is a little more complicated. If we had 
performed a different pivot sequence in (22), the two vectors that generated 
the null space would be different from the two vectors in (24). It is not even 
obvious that we would end up with two vectors. Maybe this same null space 
could be expressed as combinations of three vectors. Could it be expressed 
as multiples of a single vector? Probably not. 
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In Section 5.2 we use the theory of vector spaces to prove that any 
pivot sequence yields the same number of vectors for building the null space. 


> es 
Optional Example on the Null Space of the 
Differentiation Transformation 


In the Interlude before this chapter, we noted that the operation D(f(x)) 
of taking the derivative of a function f(x) was a linear transformation 
of functions. D acts on the abstract vector space of all differentiable 
‘functions. 

Let us determine the null space of the differentiation transfor- 
mation and see how the results in Theorem | apply to D. Null(D) ts 
the set of functions f(x) for which D(f(x)) = 0, that is, f'(x) = 0. 
These are just the constant functions, f(x) = c for some constant c. 


Null(D) = {all constant functions} 
Suppose that we want to solve the problem 


D(f(x)) = 9x* + x [find f(x) such that f’(x) = 9x7 + x] 
(25) 


By Theorem 1, part (11), if f*(x) is a solution to (25), then f*(x) 
plus any constant function [i.e., plus any member of Null(D)] is also 
a solution, say, f*(x) = 3x° + x?/2. Then 3x? + x?/2 + c, for any 
constant c, 1s also a solution. 

Conversely, by Theorem 1, part (iv), every solution to (25) can 
be written as the sum of a specific solution f*(x) to (25) plus some 
constant function [a member of Null(D)]. In this case, every solution 
has the form 3x° + x7/2 + c. This result is the basic formula about 
the form of the integral (the antiderivative) of a function. wi 


We close this discussion of null spaces by noting a close relationship 
between null spaces and eigenvectors. An eigenvector u, with associated 
eigenvalue A, satisfies the n-by-n homogeneous system 


(A — ADu = 0 (or more familiarly, Au = Au) 


Thus an eigenvector associated with A is a member of the null space of 
(A — AI). For example, if A is the transition matrix of the frog Markov 
chain, then the stable distribution p* satisfies Ap* = p* or (A — Dp* = 
0, so p* is in the null space of A — I. 

Conversely, Au = 0 is equivalent to (A — ODu = 0. So any nonzero 
vector u in the null space of A is an eigenvector of A associated with 
eigenvalue A = 0. 

So far we have considered systems with multiple solutions. Next we 
discuss systems with no solution—where Ax = b cannot be solved for some 
vectors b. Recall that the set of b’s for which Ax = b can be solved is 
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called the range of A. The following simple growth model presents a system 
with no solution. 


Example 5 . An Inconsistent System of Equations 


Let c, be the value of currency on isle 1 and c, the currency on isle 
2. Suppose that the growth equations for currency in the next period 
are 


C, = 4, F355 (26) 
6c, + .d€> 


C2 


Consider the problem of picking c, and c, so that in the next period, 
isle 1 has $100 more and isle 2 has $200 less: c;} = c, + 100 and 
cs = c, — 200. Then c,, c, must satisfy the equations 


c, + 100 = c, = .4c, + .5c, 
Co — 200 = c, = .6c, + .Se, 


Or 


6c; — .5co = —100 (27) 
— .6¢, + :5c,.= 200 


When we pivot on entry (1, 1) in (27), we obtain 


yee $c — ie (28) 
0 = —100 
The second equation in (28) is an impossibility. Pa 


A system of equations is called inconsistent if, when reduced, it yields 
an impossible equation. If a system of equations is inconsistent, no solution 
is possible. Conversely, if no solution is possible, elimination must reveal 
an inconsistency—otherwise, elimination will produce a solution. 

An extreme case of inconsistency arises in regression. 


Example 6. The Regression Model y = gx + ras 
System of Equations to Be Solved 


Suppose that we have the set of (x, y) points (1, 3), (2, 5), (3, 4), 
(3, 6), (4, 7), and (4, 6). The x-value might represent the number of 
years of college and the y-value the score on some graduate admissions 
test. We want to fit a line of the form y = gx + r to these data. That 
is, we want the best possible estimates ¥, for each y, when ¥ is the 
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linear function gx + r. Trying to draw a line that actually passes 
through these six points is equivalent to trying to solve the system of 
equations 


=Ig+r 
=2q+r 
= 3g +r or y = Xq, withX = 


3q+ r 
=4g+r q = lq. 7] 


On nH > NA W 
| 

-& bh WW NO — 

ee ee ee 


=4qg+r 
(29) 


Clearly, no solution is possible here in the regular sense—system (29) 
is inconsistent. However, an approximate solution to y = Xq is pro- 
vided by the least-squares fit of regression theory. fe 


Let us turn our attention to the task of determining the range of a 
matrix, that is, to finding those b’s for which the system Ax = b has a 
solution. 


a a a 
Example 7. Range of a Projection Transformation 


In Section 4.1 we observed that the linear transformation 


x’ = .Sx + .Sy (30) 
y = .Sx + .Sy 


projects all (x, y) points onto the line x’ = y’: It maps any [x, y] to 
the point [(x + y)/2, (x + y)/2] (see Figure 4.3). 

Let A be the coefficient matrix in (30). The range of A is the set 
of vectors [x’, y’] such that x’ = y’. Let us derive this defining equation 
for the range directly from A. Letting w = [x, y] and w’ = [x’, y’], 
we rewrite the transformation Aw = w’ in (30) as Aw = Iw’: 


ox + Sy = Ix + OY (31) 
ox + Sy = OX ‘+ Iy’ 


Now we try to perform elimination by pivoting to convert 
Aw = Iw’ into the form Iw = A™ 'w’, as we do when computing the 
inverse of A. We subtract the first equation from the second (to elim- 
inate x from the second equation) and divide the first equation by .5 
to obtain 


x+y = 2x’ (32) 
QO= -x' + y’ 
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The second equation in (32), which can be rewritten x’ = y’ gives a 


condition that must be true for any w’ = [x’, y’] satisfying Aw = 
Iw’. We have verified that vectors [x', y’] with x’ = y’ form the range 
of A. fa 
ae eee) 


Example 8. The Range of a Refinery 
Production Problem 
Consider the following variation on our refinery production problem. 


Now there are just two refineries and their collective output vector 
[b,, b5, b3] is given by 


Heating oil: 20x, + 4%, = D, 
Diesel oil: 10x, + 14x, = 
Gasoline: IX; F DW 


> 
nN 


(33) 


| 
> 
7) 


We seek an equation describing possible output vectors. That is, if A 
is the coefficient matrix of (33), we seek a defining constraint on the 
range vectors of A. 

We use the technique introduced in Example 7. We write the 
system Ax = b in (33) as Ax = Ib and perform elimination by 
pivoting at entry (1, 1) and then entry (2, 2) in the augmented matrix 


[A I): 
wm Ai DO O Dee 8 OO 
10 14 0 1 OJ —10 12 -$ 1 O 
5 SiG OC 4 0 4|-40 1 (34) 
1 O| to -# O 
—>-!10 1 -¥ 7 O 
0.0: jesae — 1 


The reduced augmented matrix in (34) corresponds to the system of 


equations 
XxX = Ts0D, + 0d 
X5 = 24D, + ixb, (35) 
0 — — sb, — 3b, + b. 


The last equation in (35) can be rewritten as 


— 


b, = db, + 1b, = ib(b, + 3b,) (36) 


This is the range constraint we were looking for. In terms of refinery 
production, it means that we can achieve any production vector b in 
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which the gasoline production (b;) equals 7z the sum of the heating 
oil production (b,) plus three times the diesel oil production (5,5). 

Suppose that heating oil (b,) and diesel oil (b,) are the outputs 
of primary interest and we want b, = 300 and b, = 300. Then we 
pick a b; using (36) to get a vector in the range. We set 


b, = 72(b, + 3b,) = 73(300 + 3 - 300) = 100 


Now we can determine the appropriate production levels x,, x, 
from the first two equation in (35). With b = [300, 300, 100], then 


x, = 150), — #0b, = tay ° 300 — go - 300 


2 17.5 —3' 2 825 
X>5 —_ —azb, + izb, — —s . 300 + 75 - 300 
a 25-4 25 = 12.5 


(Note that some vectors in the range may be infeasible because 
they would make x, or x, negative.) B 


Let us try out this method for finding the range of the transition matrix 
A in our frog Markov chain. 


xamp e 9. ange UG r og arKoO alin 


As in Example 8 we try to convert Ap = Ip’ into Ip = A 'p’ using 
elimination by pivoting. We already attempted this inversion in Ex- 
ample 6 of Section 3.3. We started with the augmented matrix of 


Ap = Ip’. 

ee cee ee eat ees 10000 
So) 501 2s OU" 6 010000 
op Task 50! er Peat tyicoms 0 0 Of 
0 o .25 .50 25 ol” jo 001-0 oF 
0 0+ 0 .25 .50° .50 0 .0'0) Oe 0 
HOH. O 0° 425, 50 000001 


and ended after pivoting on entries (1, 1), (2; 2), (3, 3), (4, 4), and 


(5, 5) with 
bie oGinky Din anther aid 
bo oh 8 ale 
ome oF! Oe 
: eS © 2 ior? 
eS ae Ae ie 
D aeMleratyi) wider 06 


0 (37) 
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0 -8 6 -4 2 
—-16 16 -12% 8 =4 
[2 «=1F res 
-8§ 8 -8 8 -4 
4 =4 f =ho 
—1 1 —1 1 -1 


— © 2 © © © 


Again the last equation in (37) gives the constraint required for 
a vector p’ to be in the range of A, namely, 


0.= —p, + p, — ps + Py — Ds + De 


Or 


P2 + pa + Po = Pi + P3 + Ds (38) 


This is a nice simple formula that allows us to test whether a probability 
vector p’ can be a next-state distribution. 

It is left as an exercise for the reader to explain in terms of the 
frog Markov chain model why even-state probabilities must equal odd- 
state probabilities. 

Before leaving this example, note that we can apply this tech- 
nique for determining the range not only to A but to powers of A. The 
range of Aé tells us possible distributions for the Markov chain after k 
periods. What happens as k goes to infinity? The range should contract 
to multiples of the stable distribution [.1, .2, .2, .2, .2, .1]. This 
behavior is explored in Exercise 25. a 


To summarize what we now know about the range of an m-by-n matrix 


A, elimination by pivoting when applied to Ax = Ib results in one of three 
possibilities [cases 2 and 3 may both apply]: 


Ps 


2. 


The reduced form of A is 1. Then for each b, Ax = b has a unique 
solution, and Null(A) = 0. 

The reduced form of A contains one or more rows of zeros. Then the 
right sides of these zero rows yield defining constraints that a vector b 
in the range of A must satisfy. 

. The reduced form of A contains an m-by-m 1 plus additional columns. 
Then, for each b, Ax = b has an infinite number of solutions (the 
additional columns give rise to the null-space vectors). 


Section 5.1 Exercises 


Summary of Exercises 

Exercises 1—4 involve plotting and graphically solving systems of equations. 
Exercises 5—17 require finding null spaces and related particular solutions 
to various matrix systems. Exercises 18—25 require finding a constraint equa- 
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tion for the range and finding particular range vectors. Exercises 26—28 are 
some simple theory questions. 


1. Plot the lines of solutions to the following systems of equations. 
(a) 2x + 3y = 10 (b) x- Zp = —4 
6x + Yy = 30 —3% by = 12 


2. Describe and sketch the solution sets, if any, of the following systems 
of equations. 


(a) x + 2Zy + 3z = 0 (b) 2x + yr z=3 
4x + 3y + 5z = 0 x~— 2y -— 327=2 
=4y 4+ 2Zy + 27 = 0 3x + 4y + 5z = 3 

eh ee Sy) = 2S S (a) x2 2y + 32 = 3 
ChE YF LES Le er iS = 4 
3K = Yo (Oz = 11 2x + 10y + 10z = 8 

(e) 2x + 2y + 2z = 
Seth  —us =. o 
=—o = ++ Bre =7 


3. For each of the following systems of equations, plot each line and from 
your drawing determine whether there is no solution, one solution, or 
an infinite set of solutions. 

(a) «axe — Zy = 15 (hb) 3x — 2y 
—6x + 4y = 10 axe = Fy 


5 (C). Sx 2Zy 
5 9x — 6y 


5 
15 


4. For each of the systems of equations in Exercise 2, sketch as best you 
can the plane determined by each equation. From your sketch, guess 
whether there is no solution, one solution, or an infinite set of solutions. 
Verify your guess by solving the system. 


5. Give a vector, if one exists, that generates the null space of the follow- 
ing systems of equations or matrices. Which of these seven sys- 
tems/matrices are invertible? [Consider the coefficient matrix and ig- 
nore the particular right-side values in parts (e) and (f).] 


| = 4 -1 2 
®) ie 4 my es 3 
oat! 3 -~—] 
—j 3 l 2 l | 
(c) eee | 3 (dyiiho-2. <3 
2 l 2 3 4 5 
fe «so 29 # 2 = 6 (ff) x- ytz=3 
See) Se eS iZe = 4 Be opty SiS 


A: ee) ee a ae | 
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(g) The coefficient matrix X in the six regression equations in Exam- 
ple 6. 


6. For the following matrices, describe the set of vectors in the null space. 


ay - te © 4 ; «+ Se a 
fee See he bE «2 4h3 


| l Sf) = 


(c) 


0 
| 
l 
0 
l 


7. (a) For A in Exercise 6, part (a) and b = [10, 10], if Ax = b has the 
given solution x’ = [0, 0, 10], find the family of all solutions to 
Ax = Db. 
(b) Find a solution to Ax = b in part (a) with x, = 3. 


8. (a) For A in Exercise 6, part (b) and b = [30, 30, 20], if Ax = b 
has the given solution x’ [10, 10, 0, OJ, find the family of all 
solutions to Ax = Db. 

(b) Find a solution to Ax = b in part (a) with x, = 5. 


9. (a) For A in Exercise 6, part (c) and b = [10, 15, 5, 0, IS], 
Ax = b has the given solution x’ = [5, 0, 5, 0, 5], find the rita 
of all solutions to Ax = b. 
(b) Find a solution to Ax = b in part (a) with x, = 10 and x; = 10. 
(c) Find a solution to Ax = b in part (a) with x, = 10 and x, = 5. 


10. Consider the modified refinery system from Example 1: 


20x, + 4x, + 4x, = 700 

10x, + 14x, + 5x, = 500 
Given the solution x, = 31, x, = 10, x, = 10, use the appropriate 
null-space vector to obtain a second solution in which the following is 


true. 
(a)x,=22 (b)x=25 (©) x, = 28 


11. Consider the following refinery-type problem: 


300 
300 


TOs, “Re 355 “PF Sy 
Sx, + 10x, + 8x, 
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12. 


13. 


14, 


15. 


(a) Find the null space of the system. 

(b) Given the solution x, = x, = 20, x, = QO, find a second solution 
with x, = 10. 

(c) Repeat part (b) to find a second solution with x, = 15. 


The rabbit-fox growth model from Section 1.3, 


R = R + .1R — .15F 
PF" = F + 1K —..15F 


had stable values for which R’ = R and F’ = F when the monthly 
change was zero: 


R R' — .1R — .15F 
F = F'— .1R — .15F 


) 
0 


Solutions for these homogeneous equations were of the form [R, F] = 
r{[3, 2]. Suppose we want a vector [R, F'] that remains stable when 30 
rabbits and 30 foxes are killed each month by hunters. 


R= LE — ior = 30 4m > 3107 
PSs Fs IR = 15 - Ye wk = SSF 


30 
30 


Find the family of stable population vectors in this case by adding 
one particular solution to the set of homogeneous solutions. Find a 
stable vector with F = 400. 


Write out a system of equations required to balance the following chem- 
ical reaction and solve. 


SO, + NO; + HO— H + SQ, + NO 
where S represents sulfur, N nitrogen, H hydrogen, and O oxygen. 


Write out a system of equations required to balance the following chem- 
ical reaction and solve. 


PbN, + CrMn,O, > Cr,0,; + MnO, + Pb,0, + NO 


where Pb represents lead, N nitrogen, Cr chromium, Mn manganese , 
and O oxygen. 


Find the null space for these Markov transition matrices. 


i ils" 4S Were Er ent 
(a) |}O0 O O a ee ee ico 1 Gg 
1 Duin di yale oe a ee 
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16. 


MY. 


18. 


19. 


20. 


21. 


22. 


50 ap Pica tino es ad 4 000 0 
$9.50" 25° O wid ++ 400 0 
(a) 10 .25 .50 25 6 0443 0 0 
0 o 2 50 50| loo 4 44 0 
by) alae. hi wt 25 4 SO 0,0 Os § 4 
66 0 6-24 

'$ £0 6,0 6 $4060 0 
142000 404000 
0424200 0404 0 0 
P le 4 4. 8:0 Blo 04040 
000% 4% % 0004 0 3 
00 6 0 2 4% 000 060 % 4 


For each transition matrix A in Exercise 15, find the set of probability 
vectors p such that the next-period vector p’ (= Ap) is a stable prob- 
ability vector (as was done in Example 3). 


Prove that if A is the 3-by-3 transition matrix of some Markov chain 
and the null space of A is infinite, then either two columns of A are 
equal or else one column is the weighted average of the other two 
(one-half the sum of the other two). 


Give a constraint equation, if one exists, on the vectors in the range of 
the matrices in Exercise 5. 


(a) For the matrix A in Exercise 5, part (a), find a range vector b in 
which b, = 5. 

(b) For the b in part (a), solve Ax = b (give the family of solutions 
using Theorem 2). 


Give a constraint equation, if one exists, on the vectors in the range of 
the matrices in Exercise 6. 


(a) For matrix A in Exercise 6, part (b), find a range vector b in which 
b, = 5 and b, = 2. 

(b) For the b in part (a), solve Ax = b [give the family of solutions 
using Theorem 1, part (iv)]. 


(a) For matrix A in Exercise 6, part (c), find a range vector b in which 
b, = b, = b, = 15. 

(b) For the b in part (a), solve Ax = b [give the family of solutions 
using Theorem 1, part (iv)]. 

(c) Repeat part (a) to find a range vector with b, = 20, b, = 10, 
b, = 20. 
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23. Give a constraint equation, if one exists, on the probability vectors in 
the range of the transition matrices in Exercise 15. 


24. By looking at the transition matrix for the frog Markov chain, explain 
why the even-state probabilities in the next period must equal the odd- 
state probabilities in the next period. 


Hint: Show that this equality is true if we start (this period) from a 
specific state. 


25. Compute the constraint on the range of the following powers of the frog 
transition matrix. You will need a matrix software package. 
(a) A* (b) A° (c) A'® 


26. Show that the range R(A) of a matrix is a vector space. That is, if b 
and b’ are in R(A) (for some x and x’, Ax = b and Ax’ = b’), show 
that rb + sb’ is in R(A), for any scalars r, s. 


27. Using matrix algebra, show that if x, and x, are solutions to the matrix 
equation Ax = b, then any linear combination x’ = cx, + dx,, with 
c + d = 1, is also a solution. 


28. Show that the intersection V, M V, of two vector spaces V,, V, is again 
a vector space. 


Section 5.2 Theory of Vector Spaces 


Associated with Systems 
of Equations 


In this section we introduce basic concepts about vector spaces and use them 
to obtain important information about the range and null space of a matrix. 
Recall that a vector space V is a collection of vectors such that if u, v € V, 
then any linear combination ru + sv is in V. In Section 5.1 we introduced 
the range and null space of a matrix A: 


b for some x} 
0} 


Range(A) = {b: Ax 
Null(A) = {x : Ax 


We noted that Range(A) and Null(A) are both vector spaces. 

In Examples 2, 3, and 4 of Section 5.1 we used the elimination process 
to find a vector or pair of vectors that generated the null spaces of certain 
matrices. For example, multiples of [—1, 2, —2, 2, —2, 1] formed the 
null space of the frog Markov transition matrix. In Examples 7, 8, and 9 of 
Section 5.1, we used elimination to find constraint equations that vectors in 
the range must satisfy. For the frog Markov matrix, the constraint for range 
vectors p was p,; + P3 + Ps = Po tT Pa TF Deg: 


414 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems 


The number of vectors generating the null space and number of con- 
straint equations for the range were dependent on how many pivots we made 
during elimination. Our goal in this section is to show that the sizes of 
Null(A) and Range(A) are independent of how elimination is performed. 

The vector space V generated by a set 0 = {q,, q, ..-, q,} of 
vectors is the collection of all vectors that can be expressed as a linear 
combination of the q,’s. That is, 


V = {viv =r.q, + rq + +++ + 7,q,, 7; scalars} 


For example, if Q consists of the unit n-vectors e; (with all O’s except for a 
1 in position 7), then V is the vector space of all n-vectors, that is, euclidean 
n-space. Another name for a generating set is a spanning Set. 
The column space of A, denoted Col (A), is the vector space generated 
by the column vectors af of A. When we write Ax = b as 
afx, + aSx, +°*-:+ +a =b 


we see that the system Ax = b has a solution if and only if b can be 
expressed as a linear combination of the column vectors of A, or 


Lemma 1. The system Ax = b has a solution if and only if b is in 
Col(A). Equivalently, Col(A) = Range(A). 


The components x; of the solution x give the weights in the linear 
combination of columns that yield b. Note that Lemma | 1s true for any 
m-by-n matrix A and any m-vector b. 


SS 
Example 1. Refinery Problem as a Column 
Space Problem 


The refinery problem introduced in Section 1.2 involved three refiner- 
ies each producing different amounts of heating oil, diesel oil, and 
gasoline from a barrel of crude oil. Production levels of each refinery 
were sought to satisfy a vector of demands. The resulting system of 
equations was 


Heating oil: 20x, + 4x, + 4x, = 500 
Diesel oil: 10x, + 14x, + 5x, = 850 
Gasoline: 5X, + 3x + 12x, = 1000 


But this system is just seeking to express the demand vector [500, 850, 
1000] as a linear combination of the production vectors of the three 
refineries. That is, we seek x,, x5, x; such that 


20 4 4 500 
Xx, 110) + % 114) +e) 31 =.) B50 
5 5 12 1000 a 
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Up to this point, solving a system Ax = b was viewed as a problem 
about the rows of A, that is, about the equations specified by the rows. 
Gaussian elimination involves forming linear combinations of the equations 
(rows of A) to obtain a new reduced system that can be solved by back 
substitution. Lemma | says that solving Ax = b can equally be viewed as 
a problem about a linear combination of the columns of A. This vector 
approach to solving Ax = b has an associated geometric picture. 


System of Equations 


Consider the system of equations 


x, + x» = 4 l l 4 
or x, + X> = 
XxX a 2X = i | —?2 ] 
Solving by elimination, we find that x, = 3 and x, = 1. Figure 5.3 
graphs this solution in vector-space terms, showing the right-side vec- 
tor [4, 1] as a linear combination of the column vectors [1, 1] and 


[1, —2]. Note that the picture gives no insight into why x, = 3, 
x, = | is the solution. Bi 


To determine the size of Range(A), we analyze the structure of 
Col(A), the column space of A, which by Lemma | equals Range(A). The 
key question is: How many of the columns of A are actually needed to 
generate Col(A). Some columns in A may be redundant. 

A set of vectors a,, a, ... , a, 1s called linearly dependent if one 
of them can be expressed as a linear combination of the others. Another 
way to say this is that there is a nonzero solution x to 


XA; tite tor + xX, =O or, equivalently, Ax = 0 (1) 


Figure 5.3 
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where A is the matrix whose columns are the vectors a;. Linear dependence 
is equivalent to (1) because if x; # 0, we can rewrite (1) as 


—X;a; — Xa, + ry “ee > + X,,a,, 
Or 
Xx; X> x 
a, —— — << a, —«dd < a, ® eis — att a,, 
Xx; xX; xX; 


So any a,, for which x; # 0, can be written as a linear combination of the 
other a’s. 

A set of vectors are linearly independent if they are not linearly 
dependent. For example, the columns of an identity matrix 


L7:.@ 
I=;|0 1 O 
0 O04 


are linearly independent. If vectors a; are linearly independent, then the only 
solution to x,a, + *** + x,a, = O (i.e., Ax = 0) can be x = 0. 


aS ee eeY 
Example 3. Example of a Linearly Dependent 
Set of Columns 


l l 
a: 


wees otLd 


So the columns of A are linearly dependent. 

The following method illustrates a systematic way to find this 
linear dependence. We perform elimination by pivoting on A. We pivot 
on entry (1, 1) and then on (2, 2): 


| =I | s|>a < ! 0 
i SZ l yo =F oe Ee FY 
where A* represents the reduced form of A. 

Remember that x is a solution to Ax = 0 if and only if x 1s a 
solution to A*x = 0. Equivalently, there is a linear dependence among 
the columns of A if and only if there are is linear dependence among 
the columns of A*. But the first two columns of A* are unit vectors 
(they form the 2-by-2 identity matrix). So trivially, the third column 
of A* is dependent on the first two: 


4 
Consider the matrix A = } By inspection we see that 
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grands ‘6 oD) l 0 
geeves P=s[']e[) 


This is exactly the relationship that we found in (2). a 


If we were to apply the method in Example 3 to the coefficient matrix 
in the refinery problem of Example 1, elimination by pivoting would reduce 
the matrix to a 3-by-3 identity matrix. Since the columns of the identity 
matrix are trivially linearly independent, this means that the original columns 
in the refinery problem were linearly independent. 

We state the method used to find linear dependence in Example 3 and 
its consequences as a theorem. 


Theorem I 
(i) Let A be any m-by-n matrix and let A* be the reduced matrix 
obtained from A using elimination by pivoting. Then a set of 
columns of A is linearly dependent (linearly independent) if and 
only if the corresponding columns in A®* are linearly dependent 
(linearly independent). 

(ii) Any unpivoted column of A* (a column that was not reduced to 
a unit vector) is linearly dependent on the set of columns con- 
taining pivots. 

(iii) The columns of A* with pivots are linearly independent. The 
corresponding columns of A generate the column space of A. 


The following example illustrates this method further. 

ae o> = ee 

Example 4. Redundant Columns in 
Transportation Problem Constraints 


In Example 4 of Section 5.1 we examined the following system of 
equations (that were transportation problem constraints seen in Section 


4.6). 
x, + X = 20 
Xq + X = 30 
X5 + xX, = 15 
x; + X + Xs = 25 
X> + iB, + x, = 40 


with coefficient matrix A 


(4) 


> 

I 
or OO = 
-_-_ OOo oO = 
or oO -— © 
—~ OO = © 
o- = © © 
-_- Oo -— © SO 
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In Section 5.1 we performed elimination by pivoting on A using entries 
(1, 2), (2, 4), (3, 6) and (4, 3). The reduced matrix A* was 


l | 0 0 0 0 
= 0 0 ie = 0 
A* = 0 0 0 0 l | (S) 
l 0 l 0 l 0 
0 0 0 0 0 0 


First note that the last row of zeros can be ignored. Columns 2, 4, 6, 

and 3 of A®* (in that order) are the unit vectors of the 4-by-4 identity 
matrix. Columns | and 5 of A* are each linearly dependent on the 
four unit-vector columns. For example, 


l l Q Q 
=] 0 l 0 
a; =a, — a, + as: O0}=/]0/-—]0/+]0] (©) 
l 0 0 l 
0 0 0 0 


The relation (6) among columns in A* is mirrored in A, where 


a, = a& — a +t @;: 


or oc = 
I 

- OOO = 
| 

- OOF © 
+ 

or Or © 


Thus the columns where pivots were performed, columns a5, @3, @4, 
and a, generate the range of A. & 


From Theorem 1, part (iii), it follows that the number of pivots per- 
formed equals the number of linearly independent columns that generate the 
column space of A. 

A basis of a vector space V is a minimal-sized set of vectors that 
generate V. Implicit in this definition is the fact that a basis is a set of 
linearly independent vectors. As an example, the n coordinate vectors e; 
form a basis for the space of all n-dimensional vectors. Since a basis is a 
minimal-sized generating set, every generating set contains a basis. For 
example, while the column space of a matrix A is defined to be generated 
by the columns of A, only the pivot columns are needed to generate the 
column space, as shown in Example 4. 

The following result, which we prove in two ways, shows the theo- 
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retical relationship among the concepts of basis, linear independence, and 
unique solution. 


Proposition. If {v,, v2, . . . , V,} iS a basis for a vector space V, every 
vector in V has a unique representation as a linear combination of the 
v's. 


Proof 1 (Using the Definition of Linear Independence): Suppose that 
w is a vector in V that has two representations as a linear combination 


of the v;. So 
w= a,V; + a>4V> + ee we + a,V,, and 
w= bv, + bv, + - + bN,, 
Then 
plas =a )¥> Oe + Kila, — bv, 


Since the v,’s form a basis and hence are linearly independent, the 
linear combination of v,’s in (7) can only equal 0 if the terms (a; — b;) 
are all zero. So the two representations must be the same. a 


Proof 2 (Using Elimination). To find the representation of w in terms 
of the v;'s, we solve the system of equations for the x;,’s: 


Tie thukoWs, tet PS, EW or Ax = Ww (8) 


where A has the v,'s as its columns. Since the v,’s are a basis and 
hence linearly independent, we can pivot in every column [otherwise, 
by Theorem |, part (ii), each unpivoted column is linearly dependent 
on the pivot columns]. Then Ax = w has a unique solution (see the 
summary at the end of Section 5.1). ‘si 


Proof 2 shows us how to compute the unique representation of a vector 
w in terms of the v,’s, simply solve (8). For example, if v, = [1, 2, 3] and 
v, = [0, —1, 2] are a basis for vector space V and w = [3, 8, 5] is in V, 
then to determine the right linear combination of the v,’s to get w, we solve 


the system 
inn JG 3 1 0 3 
B Xx 
2-1)" |=] ~ 0 iff" | = —2 (9) 
Xx ») 
a 8 hai 5 0 o|” 
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Now we prove a critical vector-space lemma that resolves the question 
of whether all pivots sequences have the same length. 


Lemma 2. All linearly independent sets of vectors that generate a given 
vector space V have the same size. Any such set is a basis for V. 


Proof. Let S = {s;} and T = {t;} be two sets of linearly independent 
vectors that generate V. Suppose that S and T have different sizes; for 
concreteness, let § have four vectors and T have five vectors. Then we 
can use linear combinations of the vectors of S to represent the vectors 
in 7. Ift, = c,8, + cS, + c38,; + C484, define the S-coordinate vector 
of t, to be [c,, Cs, C3, c4]. Consider the equation, defined in (1), for 
dependence of the t; (with the t; represented in S-coordinates): 


x,t, + xt + xt, + x,t, + xf, = 0 (10) 


Since the t; are four-dimensional vectors (in S-coordinates), (10) is a 
system with four equations in five variables. Solving (10) by elimi- 
nation by pivoting leaves at least one unpivoted column and hence by 
Theorem 1, part (ii), there is linear dependence among the 5 t;. Con- 
tradiction. Hi 


The dimension of a vector space V, written dim(V), is the number of 
vectors in a basis for V. For example, the set of n unit vectors e; is a basis 
for *‘n-dimensional space’’; thus this space does indeed have dimension n. 


Combining Lemma 2 with Theorem |, part (iii), we have 


Theorem 2 
(i) The columns of A used in a pivot sequence are a basis for the 
range of A. 
(ii) All pivot sequences have the same size; the size is the dimension 
of the range of A. 


By Theorem 2, it now makes sense to talk about the number of pivots 
in a pivot sequence. The rank of a matrix A, written rank(A), is the number 
of pivots in any pivot sequence. 

Corollary 
Rank(A) = Dim(Range(A)) 


We have been concerned about which sets of columns of A are linearly 
dependent, that is, when there is a nonzero x so that 


HAs, He + <7 +A, = O or Ax = 0 (11) 


Such an x in (11) is a vector in Null(A), the null space of A. 
If A* is the reduced matrix, then we know that x is a solution_of 
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Ax 


= © if and only if it is a solution to A*x = 0. Thus Null(A*) equals 


Null(A). 


Example 5 _ Relation Between Null Space and 
Column Space 


Consider the matrix 


and perform elimination by pivoting. 
Pivoting on entry (1, 1) yields 


| 2 3 l 
) =4 —4 4 
0 7 7 7 


Pivoting on entry (2, 2) yields 


A* = | 0 l l l 
0 0 0 0 


Clearly, the first two columns of A*, the pivot columns, are a 
basis for the column space of A*—they are linearly independent and 
generate Col(A*). Then by Theorem 1, part (iii), the first two columns 
in A generate Col(A). 

Since Null(A*) = Null(A), a basis for Null(A*) will be a basis 
for Null(A). Looking at A*, we see that 


as =ai+a>s and # a, = —a;, + a3 (12) 


(where a; denotes the ith column of A*). The vector equations in (12) 
can be rewritten as 


=A 
a 


(13) 


ai —a>,+a,=0 or § A* 


| 
= 
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Let x} = [-—1, —1, 1, 0] and xj = [1, —1, 0, 1]. Since x} and 
x, are linearly independent (look at their last two entries) and generate 
Null(A*), they form a basis for Null(A*) and hence for Null(A). & 


In general, given a reduced matrix A*, each unpivoted column can be 
expressed as a linear combination of the pivoted columns, which are unit 
vectors in A* [see (12)]. This linear combination yields a solution to 
A*x = 0, as shown in (13). If unpivoted column A of A* has an entry 
a’, in the ith row, the solution x; we obtain is 


‘r — [—a’,, a eee ee Pi ON art. Mee ceoes 5 ON (14) 


where the entries for unpivoted columns are all 0 except for entry h, which 
is | (assuming pivots were performed in the first m rows and m columns). 
For example, in Example 5, the fourth column of A* begins 


= J 
l 


(in the two pivot rows), so x; = [l, —1, 0, 1]. The entries in (14) from 
the unpivoted columns form a unit vector, so the set of x;,’s are linearly 
independent and form a basis of the null space of A. 

Observe that every column in A* is now either (i) a unit vector that is 
in the basis of the column space; or else (ii) gives rise to a vector in the 
basis of the null space. That is, every column contributes to the size of the 
range of A or to the diversity of different solutions possible to Ax = b, for 
a given b. 


Theorem 3. Let A be a matrix with n columns. The vectors x; in (14) 
corresponding to unpivoted columns form a basis for Null(A). Fur- 
thermore, 


dim(Range(A)) + dim(Null(A)) = n 


Corollary A 
Dim(Null(A)) = n — dim(Range(A)) 
n — rank(A) 


Corollary B. Any solution x’ to Ax = b can be written in the form 
x’ = x*¥ + yx) + mx5 ++ + 7X; (15) 


where x* is a given particular solution to Ax = b and the x;’s are as 
given in (14). 
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Proof of Corollary B. By Theorem 1, part (iv) of Section 5.1, any 
solution x’ can be written x’ = x* + x°, the sum of a particular 
solution x* and some null space vector x°. Since the x; generate 
Null(A), any such x° is some linear combination of the x7,’s. 4 


We complete our brief survey of vector spaces of A with Row(A), the 
vector space generated by the rows of A. As noted when elimination was 
introduced in Section 3.2, the elimination process repeatedly replaces a row 
with a linear combination of rows. When rows are zeroed out in the elimi- 
nation process, they are linearly dependent on the preceding rows in which 
pivots were performed. Conversely, every nonzero row in A* is a pivot row 
(where a pivot was performed). 

Because the submatrix of A* formed by the pivot rows and pivot 
columns is an identity matrix, these pivot rows are linearly independent (see 
Exercises for details) and will be shown shortly to form a basis for 
Row(A). Hence the dimension of the row space equals rank(A) (= number 
of pivots). 


Theorem 4. Let A be any m-by-n matrix. The maximum number of linearly 
independent rows in A and the maximum number of linearly inde- 
pendent columns in A are equal. Both are rank(A). That is, 


dim(Row(A)) = rank(A) = dim(Col(A)) 


The results in Theorems 2, 3, and 4 yield several more equivalent 
conditions for when a system of equations has a unique solution. 


Theorem 5. Let A be an n-by-n matrix. The system Ax = b has a unique 
solution, for any b, if and only if any of the following equivalent 
conditions are satisfied. 

(i) The dimension of Range(A) is n. 
(11) The column vectors of A are linearly independent. 
(ii) The dimension of Row(A) is n. 
(iv) The row vectors of A are linearly independent. 
(v) The null space of A has dimension 0 (consists of only the 0 
vector). 


The following example illustrates the uses of Theorem 5. 


a Se A BSH 
Example 6. Row Space Test for Unique Solution 


Let us consider the following variation of our refinery model introduced 
in Section 1.2. Suppose that we change the numbers in gasoline pro- 
duction so that the third row is the sum of the first two rows. 


Heating oil: 20x, + 4x, + 4x, = 500 
Diesel oil: 10x, + 14x, + 5x, = 850 
Gasoline: 30x, + 18x, + 9x; = 1000 
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Once we observe that the last row in the coefficient matrix is the sum 
(a linear combination) of the first two rows, so that the rows are not 
linearly independent, then we know by Theorem 5, part (iv), that there 
will not be a unique solution to this production problem: either no 
solution or multiple solutions. If we tried to perform elimination, we 
would only be able to pivot twice (if we could pivot in all three rows, 
they would have to be linearly independent). 

Theorem 4 tells us that if the rows are dependent, the columns 
also are. However, that column dependence is far from obvious. & 


We now give a theoretical application of Theorems 3 and 4. Suppose 
that A is m-by-n, where m < n. All the columns cannot be linearly inde- 
pendent, since dim(Col(A)) = dim(Row(A)) and there are only m rows, 
m <n. Therefore, rank(A) = m < n. We conclude from Theorem 3, 
dim(Null(A)) = n — rank(A) = n — dim(Row(A)) > 0. Then Null(A) ts 
infinite and Ax = b cannot have a unique solution: 


Theorem 6. If A is an m-by-n matrix, where m < n, the system Ax = b 
can never have a unique solution (either multiple solutions or no 
solution). 


We close this section with a discussion of another way to interpret the 
elimination process and the rank of a matrix. To do this, we must introduce 
the concept of a simple matrix. 

A simple matrix K is formed by the product c * d of two vectors ¢ 
and d in which ¢ is treated as an m-by-1 matrix and d as a 1-by-n matrix. 
Thus entry k;, of K equals a;b;. We refer to this product ¢ * d as a matrix 
product of vectors. For example, the following matrix product of vectors 


yields a simple matrix. 
3 
Ee i [1, 2, 3] 


< Pap sue Ss a 
ae) (ay es, ede ane Wee, 


ay! lee et, 
|-1 -2 -3 


All rows in a simple matrix are multiples of each other, and similarly 
for columns. If we pivot on an entry (i, /) in a simple matrix, the elimination 
computation will convert all other rows to 0’s (verification is left as an 
exercise). This means that simple matrices have rank 1. 

Simple matrices will be used extensively in Section 5.5. For now, the 
property of simple matrices of interest is 


[so Ee. 2. 


Theorem 7. Let C be a m-by-r matrix with columns cf and D be an r-by- 
n matrix with rows d‘*. Then the matrix multiplication CD can be 
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decomposed into a sum of the simple matrices c+ * d¥ of the column 
vectors of C times the row vectors of D. 


CD = cf * dj + c§ *+d5 + --- + cf * db (17) 


One way to verify (17) is using the rules for partitioned matrices, that 
is, we partition C into r m-by-1 matrices (the ¢c©) and partition D into r 
l-by-n matrices (the d‘*); see Exercise 15 of Section 2.6 for details. 

We illustrate this theorem with the following product of two matrices. 


Example fs Decomposition of Matrix 
Multiplication into a Sum of 
Simple Matrices 


Let 
> 3 ie $2. ES 
C= and p= 7 4 “iS 16 
q 35 6 
17 18 19 


Then 


1xXll +2x14+3xX17 #%I1k12 +2x15+3xX18 %1*13+2x16+ 3x19 
CD = | ase 


4x11+5x14+6x17 4x12 +5x15+ 6x18 4x13+5x16 + 6x19 


18b 
AX1T1. 404327 242015 5x14, 5x85. SX16 6X17 6x18 ned 


= be 1x12 iad ae 2x15 ded peek 3x18 3x19 


i wT). EX 4a] + 4 *|14 15 16| + | *({17 18 19] (18c) 


ef «di + cS *« d& + cf * df 


The first simple matrix cf * d* in (18c) is a matrix containing the first 
term of each scalar product in the entries of CD in (18a). Similarly 
for the second and third simple matrices. Pe] 


We now show how an m-by-n matrix A can be decomposed into a sum 
of k simple matrices, where k = rank(A). Another way to say this is that 
we subtract a set of simple matrices from A to eliminate all entries in A (to 
reduce A to the O matrix). 

Our strategy will be to form a simple matrix K, = I, * u, whose first 
row equals a‘ (the first row of A) and whose first column equals a (the 
first column of A). Then A — K, will have 0’s in its first row and column. 
We form K,, to remove the second row and column of A; possibly we zero 
out additional rows and columns in the process. We continue similarly with 
K,, and so on. 

Let u, = a and let 
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oY I c . |4n 421 41 
= (or) a = OS 
ay Qi; Gy, 4 


(actually the first entry of I, is 1, = a,,/a,,). Then the entries of first row 
of 1, * u, are | (first entry of I,) times u,, which equals u, (= a%), as 
required. And the entries in the first column of I, * u, are I, times the first 
entry of u,, a,,. We have 


a>, Gi, G2 443 
ay, * [411, G2, 43] = | Ga, °° 

a3) G3 

a1} 


So A — K, (= A — I, * u,) has zeros in the first row and column. 
Observe that outside the first row and column, the new entry (i, /) in 
A — K, equals 


Surprise! This is our old friend the elimination operation [when we pivot on 
entry (1, 1)]. Vector I, is just the first column in the matrix L of elimination 
multipliers from the A = LU decomposition, and u, is the first row of U 
(which equals the first row of A). So we have shown that when we subtract 
from A, the simple matrix If * u% formed by I€ (the first column of L) and 
u‘ (the first row of U), we obtain a matrix with 0’s in the first row and first 
column. This new matrix is just the coefficient matrix (ignoring the first row 
of 0’s) for the remaining n — 1 equations in Ax = b when we pivot on 
entry (1, 1). 

Repeating this argument, we let K, = I$ * u* and subtracting K, from 
A — K, will have the effect of next pivoting on entry (2, 2), and zeroing 
out the second row and column. The other K; are defined and perform 
similarly, so ultimately we see that 


A=f*eulb + *up +--+ * uk (19) 


Ss 
Example 8. Refinery Matrix Expressed as a Sum 
of Simple Matrices 


In Section 3.2 we gave the LU decomposition of our refinery matrix 


y: | Bieae’ See i. OF Gl 20%. 4° 4 
A= 10 24 Sires dec Oi O12 3 
5 ay > teed tO.” .O. 1D 
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By (19), we can write A as 


A=K «uf +5 «uf + IS * uf 
l 0 0 
=/2/*([20 4 44+]1]*[0 12 3]+]/0]*[0 0 10] 
! 
20 4 4 wee oS >» -B 
=P 2° 2+ OR sia. Oo & (20) 
a 0. 4 4 e & 1 
The reader should check that this set of three simple matrices 
adds up to A. ai 


Theorem 8. Gaussian elimination can be viewed as a decomposition of A 
into a sum of rank(A) simple matrices: 


A=(Kreu% +h *euf +--- +h tuff (21) 


where k = rank(A), If is the ith column of L (the matrix of elimination 
multipliers), and u* is the ith row of U (the reduced matrix in Gaussian 
elimination). 

The minimum number of simple matrices whose sum equals ma- 
trix A is rank(A). 


The last sentence of Theorem 8 is proved in Exercise 32. The sym- 
metric role of columns and rows in Theorem 8 explains why the dimensions 
of the row and column spaces of a matrix are equal. 

It is not hard to show (see Exercise 31) that if A has the simple-matrix 
decomposition A = c, * d, + +--+ + c, * d,, then the c; are a basis for 
the column space of A and the d; are a basis for the row space of A. It 
follows that 


Corollary 
(i) The nonzero rows of U generate the row space of A and the 
nonzero (below main diagonal) columns of L generate the column 
space of A. 
(ii) Theorem 8 reproves the fact that the dimension of the column 
space of A equals the dimension of the row space of A, equals 
rank(A). 


From Theorem 7 it follows if A equals the sum of simple matrices 
If « u*, then A = LU—we have proved the LU decomposition. 

The decomposition (19) of a matrix A into a sum of rank(A) simple 
matrices is of more theoretical than practical interest. 
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Optional (Based on Section 4.6) 


Let us reinterpret the simplex algorithm of linear programming using the 
concept of a basis for the column space. When slack variables were added, 
as say to the table—chair production problem in Section 4.6, the form of the 
constraint equations was 


Si Ray ee RE AE 

Objective Function | 40__200__0__0__ O_O. __0 
Wood | SY FU oO 8) 00 22) 
Labor 2 s Bel © OB Zae 
Braces b a. OMe f .O ) S860 
Upholstery 2 Os. O oD fF 1800 


Observe that the columns associated with the slack variables x3, x4, X5, X¢ 
form an identity matrix and hence are the basis for the column space. For 
this reason, x3, X4, X5, X, are called basic variables and variables x,, x, 
nonbasic, for the linear program (22): Clearly, the columns of nonbasic 
variables x,, x, in (22) are linearly dependent on the basic variables’ col- 
umns. Recall that the simplex algorithm sets the nonbasic variables equal to 
O so that the basic variables then have nonnegative values equal to the 
corresponding right-side entry. 

The pivot step in the simplex algorithm can be viewed as picking some 
nonbasic variable to enter the basis while a basic variable leaves in the basis. 
For (22), we chose x, (whose coefficient 200 in the objective function is 
largest) to enter and x; to leave the basis. After pivoting, we have 


ae Se eR ag By Ea 
Objective Function | 4’__0__0__0__—*3__0__ —60,000 
Wood 4. VE Oe Se 200 
7 1 (23) 
Labor 7 we® O93 ae” 1,100 
Braces -- 2°00 iz O| 300 
Upholstery 2. eR Ue @ 0 | 1,800 


Note that now the columns of x, x3, x,, Xx, form the basis for the column 
space. 

Whereas our discussion in Section 4.6 focused on which were the 
independent (nonbasic) variables, the traditional approach is to concentrate 
on which are the basic variables. 


Section 5.2 Exercises 


Summary of Exercises 

Exercises 1—10 involve the column space, linear dependence, and generators 
of the column space. Exercises 11—25 involve associated theory. Exercises 
26—32 involve simple matrices and the representation of a matrix as a sum 
of simple matrices. 
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1. 


Three soup factories F',, fF, and F generate production vectors, f, = 
(20, 100, 20], f, = [200, 0, 50], and f, = [0, 100, 200], of the amounts 
(in gallons) of tomato, chicken, and split-pea soup produced each hour. 
If the demand is d = [5000, 3000, 3000], write a system of equations 
for determining the right linear combination (how long each factory 
should work) of production vectors to meet the demand. Determine the 
weights. 


. There are three refineries producing heating oil, diesel oil, and gaso- 


line. The production vector of refinery A (per barrel of crude oil) is 
[10, 5, 10] and of refinery B is [4, 11, 8]. The production vector for 
refinery C is the average of the vector for refineries A and B. If the 
demand vector is [380, 370, 460], write a system of equations for 
finding the right linear combination of refinery production vectors to 
equal the demand vector. Find the set of such linear combinations. 


. For each of the following sets of vectors, express the first vector as a 


linear combination of the remaining vectors if possible. 

(a) [1, 1): (2, 1),2, -l =) LB, 2]: (2, —3], [—3, 6] 
(c) [3, —1]: (1, 3], [-—2, 3] 

(a) [4,751 G2, 1, Oly 10,-4,. 24 3, 2,. 1) 


. For each of the following pairs of a matrix and a vector, express the 


vector as a linear combination of the columns of the matrix. Plot this 
linear combination as was done in Figure 5.3. 


SIE) ob EE 


. The first column in the inverse A~' of a 2-by-2 matrix A gives the 


weights in a linear combination of A’s columns that equals e, = 
[1, 0]. The second column in A™! gives the weights in a linear com- 
bination of A’s columns that equals e, = [0, 1]. Find these weights 
for expressing e, and e, as linear combinations of the columns and plot 
the linear combinations as in Figure 5.3 for the following matrices. 


4 0 aes 2 3 
(a) ' , (0) b : (c) k 4 


Tell which of the following sets of vectors are linearly independent. If 
linearly dependent, express one vector as a linear combination of the 
others. 

(a) [4's 2], [= <2, 4] (b) il, 3], [3, 7 1] 

(eo) PBy 0, E23 Sy FS] (ay [24s eh. T— 2,05, = 2]: (bz. 1, 2) 
(eo) (2, 1, Ol, 11-14, SIA 10; 23-1] 


Find a set of columns that form a basis for the column space of each 
of the following matrices (use the reduced matrix A* as in Examples 3 
and 4). Give the rank of each matrix. 
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10. 


11. 


| a es | ee. 
a cae SyPasr - S043 
a. 4 

my Mould aa 

el * mili =2 Fe 

i es a 


For each matrix in Exercise 7, find all sets of columns that form a basis 
for the column space. 


Find a set of columns that form a basis for the column space of each 
of the following matrices. Give the rank of each matrix. Also find a 
basis for the null space of each matrix. 


4 2 Ds Fez 
CAE be Leis my lt 2 % 
pau: 
a ne bo -s Sy 
; O° a 1 4 Tike ee TS 
ice 10 10 1 He ON ig a a en 
>107 4 oe ee 
Ct. adie 1. A SSA 
‘Wee Gr ie Or an 
O49 6 1 O0n 
i; oh aoe O 
aide (ae Wes Cas ae 
o e663 ated 
io: 8. 0e 2 


Let A be a coefficient matrix in a refinery problem, as in Example 1, 
with each column representing the production vector of a refinery. Ex- 
plain the practical significance of having one column be a linear com- 
bination of the others. What constraints and what freedom does this 
permit the manager of the refineries? 


For a m-by-n matrix A, the reduced matrix A* can be written in the 


I 


O 4 , where I is an r-by-r identity matrix 


partitioned form A* = 


(r = rank(A)) and R 1s r-by-(n — r). Using the submatrix R and an 
appropriate size identity matrix I, give a matrix N in partitioned form 
whose columns are the basis of Null(A). 


Hint: See expression (14). 
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12. 


13. 


14. 


15. 


16. 


17. 


Determine the rank of matrix A, if possible, from the given information. 

(a) A is an n-by-n matrix with linearly independent columns. 

(b) A is a 6-by-4 matrix and Null(A) = {0}. 

(c) A is a 5-by-6 matrix and dim(Null(A)) = 3. 

(d) A is a 3-by-3 matrix and det (A) = 17. 

(e) A is a 5-by-5 and dim(Row(A)) = 3. 

(f) A is an invertible 4-by-4 matrix. 

(g) A is a 4-by-3 matrix and Ax = b has either a unique solution or 
else no solution. 

(h) A is a 8-by-8 matrix and dim(Row(A’)) = 6. 

(i) A is a 7-by-5 matrix in which dim(Null(A’)) = 3. 


In this exercise, the reader should try to find by inspection a linear 
dependence among the rows of each matrix. If dependence is found, 
use elimination by pivoting to find a linear dependence among the col- 
umns (as in Examples 3 and 4). 


yk & te! Zeer Sf 9 
(ay 1D, 1a (er 2 hh, (c) |}4 5 6 
+ or ae 2.4 2 he es 


Show that the number of pivots performed in Gaussian elimination will 
be the same as the number of pivots in elimination by (full) pivoting. 
(Thus the rank of a matrix can be defined in terms of either type of 
elimination. ) 


Let A* be the reduced-form matrix of A. 

(a) Show that nonzero rows of A* generate Row(A) (1.e., linear com- 
binations of the rows of A* generate the same vectors as linear 
combinations of rows of A). 

(b) Show that the nonzero rows of A* must be linearly independent. 
Hint: Look at the form of A*. 

(c) Conclude that the nonzero rows of A* are a basis of Row(A) and 
hence that the dim(Row(A)) = rank(A). 


Let U be the upper triangular matrix produced at the end of Gaussian 

elimination on the matrix A. 

(a) Show that nonzero rows of U generate Row(A) (i.e., linear com- 
binations of the rows of U generate the same vectors as linear 
combinations of rows of A). 

(b) Show that the nonzero rows of U must be linearly independent. 
Hint: Look at the form of U. 

(c) Conclude that the nonzero rows of U are a basis of Row(A) and 
hence that the dim(Row(A)) = rank(A). 


(a) Suppose that the rows of A are linearly dependent. Show that at 
the end of Gaussian elimination, the resulting upper triangular ma- 
trix U will have at least one row of zeros. 

(b) Suppose that A is a square matrix with linearly dependent columns. 
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18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


Show that at the end of Gaussian elimination, the resulting matrix 
U will have at least one row of zeros. 


Hint: Use part (a) and Theorem 5. 


Show that Row(A’) (A’ is the transpose of A) equals the Col(A) and 
that Col(A’) equals Row(A). Show that rank(A) = rank(A‘). 


(a) Use the results of Exercise 17 and 18 to show that the nonzero 
rows in the reduced form A’* of A’ are a basis for the Col(A). 

(b) Use part (a) to compute a basis for Col(A) for the matrix A in 
Example 3. 

(c) Repeat part (b) for the matrix in Example 4. 


(a) Show that the rank of a matrix does not change when a multiple 
of one row is subtracted from another row. 

(b) Show that the rank of a matrix does not change when a multiple 
of one column is subtracted from another column. 


Hint: Use A’. 


(a) Show that if A is an m-by-n matrix and b an m-vector, b is in 
Range(A) [= Col(A)] if and only if rank({A b]) = rank(A), 
where [A__ b] denotes the augmented m-by-(n + 1) matrix with b 
added as an extra column to A. 

(b) If Ax = b has no solution, show that rank({[A b]) must be 
rank(A) + 1. 


Let A be an n-by-n matrix. Show that det(A) = 0 if and only if the 
rows of A are linearly dependent or if the columns of A are linearly 
dependent. 


Hint: Use Theorem 5 and Theorem 4 of Section 3.3. 


This exercise examines the vectors in the column space of two matrices 
A and B, that is, vectors in Col(A) M Col(B). If d is such a vector, 
then Ax’ = d and Bx” = d, forsome x’, x”. ShowthatifC =[A —B] 
and x* = [x’ x’"], then d is in Col(A) M Col(B) if and only if x* is 
in Null(C). 


Show that any set H of k linearly independent n-vectors, k < n, can be 
extended to a basis for all n-vectors. 


Hint: Form an n-by-(k + n) matrix A whose first k columns come from 
H and whose last n columns are the identity matrix—thus dim(Col(A)) 
= n; show that a basis for Col(A) using the elimination by pivoting 
approach in Example 5 will include the columns of H. 


Show that A is an eigenvalue of A if and only if det(A — AD = 0. 


Hint: ‘If’ part is immediate; for the ‘‘only if’’ part, use Theorem 3 
and Theorem 4 of Section 3.3. 
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26. Leta = [1, 2, 3], b = [2, 0], c = [—1, 2, 1]. Compute the following 


Zi. 


28. 


29. 


30. 


31. 


32. 


simple matrices. 
(a) a*xb (b) a*¥c (c) c*C 


Verify that the simple matrices in Exercise 26 have rank |. 


te? 
(a) Show that A = i is a simple matrix by giving the two 


2 


vectors whose matrix product is A. 
(b) Repeat part (a) for 


IZ -6 9 
B=/8 -4° 6 
hic 2? 3 


Write each of the following matrices as the sum of two simple matrices. 


te : bee 
@™) | 3 (b) |3 4 5 
78 9 


(a) Find the LU decomposition of matrix in Exercise 9, part (b) and 
use the decomposition to write the matrix as the sum of two simple 
matrices as in Example 8. 

(b) Repeat part (a) for the matrix in Example 5. 

(c) Repeat part (a) for the matrix in Exercise 9, part (d). 


Describe the column space and row space of a simple matrix a * b and 
give a basis for each. 


(a) Show that if a matrix A of rank k is expressed as the sum of k 
simple matrices c; * d,, i = 1, 2,..., k, then the c, are a basis 
of Col(A) and the d, are a basis of Row(A). 

(b) Prove the last sentence in Theorem 8, the minimum number of 
simple matrices whose sum is matrix A is rank(A), as follows: The 
LU decomposition yields a sum of k simple matrices equaling A, 
where k = rank(A), by the first part of Theorem 8; if fewer than 
k simple matrices could sum to A, use part (a) to show that then 
dim(Col(A)) < rank(A)—impossible. 


Approximate Solutions 
and Pseudoinverses 


This section presents a method for obtaining an approximate solution that 
can be used to ‘‘solve’’ an m-by-n system Ax = b that has no solution, that 
is, when b is not in the range of A. We seek a “‘solution’’ w that gives a 
vector p = Aw which is as close as possible to b. In the following discus- 


sion, we use the euclidean norm |a| = 


Var + ai +--+ + a2, because 
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we will be treating the vectors p, b as points in euclidean space and mini- 
mizing the distance of the error |b — pl. 


Example ; Refinery Problem Revisited 


Recall the refinery model first presented in Section 1.2 with three 
refineries producing three petroleum-based products, heating oil, diesel 
oil, and gasoline. 


Heating oil: 20x, + 4x, + 4x, = 500 
Diese} oil: 10x, + 14%, + 5x, = 850 (1) 
Gasoline: 5x, + 5x, + 12x; = 1000 


Suppose that the third refinery is out of service. We still want to 
attempt to produce the same amounts of these products. That is, we 
want to satisfy the system (as best we can) 


Heating oil: 20x, + 4x, = 500 


Diesel oil: 10x, + 14x, = 850 or (2) 
Gasoline: 5x, + 5x, = 1000 
20 4 500 
x,| 10) + x5} 14] =] 850 
5 5 1000 


In Section 3.2 we solved (1) by Gaussian elimination and obtained the 
solution x = [4%, 33%, 674]. Since this solution is unique and involves 
a nonzero value for x,, we shall not be able to solve (2) exactly. 

Let A be the matrix of coefficients in (2) and let b be the right- 
side vector. We seek to minimize |b — Aw| (recall that we are using 
the euclidean distance as our norm). The approximation we want re- 
quires a vector w so that 


20 4 
Aw = w,| 10] + w,} 14 
5 5 
is as close to 
500 
850 
1000 
as possible. | | 8 


This type of approximate solution is called a least-squares solution, 
because the euclidean distance |b — Aw| to be minimized involves a sum 
of squares. For such approximate solutions to be meaningful, we require 
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that A have more rows than columns; otherwise, a more sophisticated theory 
is needed. 


Recall that we encountered least-squares solutions in regression. We 
review the regression problem presented in Section 4.2. 


Example 2. Simple Linear Regression Reviewed 


We wanted to fit the three (x, y) points (0, 1), (2, 1), and (4, 4) toa 
line of the form y = qx (where the x-value might be the number of 
college mathematics courses taken and the y-value a score on some 
test). The requirement gx = y for these points yields a system of 


equations 
Og = 1 0 l 
2q = | or qx = y, wherex = |2]|, y=]1 (3) 
4q=4 4 4 


Figure 5.4a shows the points to be estimated by this line, and Figure 
5.4b shows the vectors y and qx in 3-space. Our goal is to find g so 
that the estimates y,; = qx; in (3) are as close as possible to the true 
y;. A way to view qx in Figure 5.4b is as the projection of y onto the 
line through x from the origin. 

To obtain the value for g, we minimized the sum of the squares 
of the errors (SSE). 


SSE = 2 (qx; — y,¥ = Og — 1% + Qq — 1)? + (4g. -— 4 


= 20g? — 36g + 18 (4) 


In Section 4.2 we found the optimal g by differentiating (4), setting 
the derivative equal to 0, and obtaining g = .9 (see line y = .9x in 
Figure 5.4a). We also calculated how to minimize SSE for an arbitrary 
number of x — y pairs and obtained the formula 


HY). X* F : 
at 5 
d A ee Oe ©) 


where x and y are the vectors of x- and y-values. Note that for this 
example 


ee Food eee eS 


10504 2X24 EME 


When the model y = gx + r is used, (3) is changed to 


Ogt+tr=1 ys 
2g +r=1 or Aw=y, where A = | 2 i}. w= [4] 
4q+r=4 sil. 
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Figure 5.4 (a) Regression estimates for 
points (Q, 1), (2, 1), and (4, 4). 

(b) y = 0.9x is regression solution when 
x = (0, 2, 4J, y = [1, 1, 4]. © pis 
closest vector to b in range of A. 


Second 
value 
vr 
4 
. 
= 
4 
Line of s 
oints gx ea 
4 ‘i / $ 
ae. 
/ Sa 
- / 
3 » he [O 2,4) , ¢ 
fe a Third 
/ i value 
2 / >) 
J y ={1, 1,4] 
/ 
, / 
A . 
3 
/ 2 
] 
/ I 2 3 4 First value 


(b) 


3 


(a) 


Range of A 


(c) 


Now the least-squares solution is a pair g, r such that the vector 
g{O, 2, 4] + r[1, 1, 1] (which is in the range of A) is as close as 
possible to y (see Figure 5.4c, where p = Aw, b = y). Again, we 
can view p as the projection of b onto the range of A. in 


We have now motivated the importance of finding w so that the vector 
p = Aw is as close as possible to a given vector b, that is, so that p is the 
projection of b onto the range of A. To determine w and p, we shall use 
the following geometric property of the projection p of b onto the range of 
A: The error vector b — p is at right angles to—perpendicular to—vectors 
in the range of A (see Figure 5.4b and c). 
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The term orthogonal is used in linear algebra instead of “‘perpendic- 
ular’’ to describe two vectors at right angles. The following theorem provides 
a simple numerical test for orthogonality. 


Theorem I. Two n-vectors a, b are orthogonal if and only if their scalar 
product is zero, a: b = 0. 


This theorem is a consequence of a more general fact about angles 
between vectors that is proved later in this section. 

We use Theorem | to obtain a matrix equation satisfied by p (where 
p = Aw) when b — p is orthogonal to vectors in the range of A (implying 
that p is the closest vector to b in the range of A). 

First we illustrate the procedure by obtaining a formula for q in the 
simple regression model ¥ = gx in Example 1, that is, to find q so that gx 
is as close as possible to y. The error vector in this case is y — gx, and the 
range is simply all multiples of x. The error vector y — qx should be 
orthogonal to the range, that is, orthogonal to x. By Theorem |, this yields 


x*(y — qx) = 0 or x°y — qx°x = 0 


Solving for g, we obtain the same regression formula as in (5): 


” eed arn 20 (7) 


As noted above, qx is the projection of vector y onto vector x. Thus 


Theorem 2. The projection of y onto x (i.e., onto the line from the origin 
through x) is gx, where g = x* y/x- x. 


Next consider the general case where we want an approximate solution 
to Ax = b for any m-by-n matrix A. The error vector b — p = b — Aw 
should be orthogonal to every vector in the range of A. Recall that the range 
of A is formed by linear combinations of the column vectors of A, r;aS + 
roaS +--+ + r,a©. If b — Aw is orthogonal to any linear combination 
of the column vectors, then it certainly must be orthogonal to these column 
vectors af themselves. By Theorem | we have 


af: (b — Aw) = 0 fort ae Pe vee (8) 
If we make a matrix A* whose rows are the columns of A, then (8) gives 
A*(b — Aw) = 0 (9) 


But this matrix A* is simply A’, the transpose of A (whose rows are the 
columns of A). So (9) is 
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A’(b — Aw) = 0 or (A7A)w = A’b (10) 


Assuming that the matrix A’A is invertible, we can solve (10) for w to 
obtain 


w = (A’A)~'A’b (11) 


The right side of (11) is a pretty messy expression. Since the rows of 
A’ are the columns of A, entry (i, j) in the matrix product A’A is just the 
scalar product af - a€ of the ith column of A times the jth column of A. 
When A consists of a single column x, as in the regression model qx = y, 
(11) reduces to g = (x x) 'x + y org = x* y/x + x—the formula we . 
obtained above. 

We call the product of matrices on the right in (11) 


At = (A‘A)"'A’™ (12) 


the pseudoinverse of A (the term generalized inverse is also used). If A is 
an m-by-n matrix, A* will be an n-by-m matrix. 


Theorem 3. The least-squares solution w to the system of equations 
Ax = bis w = Atb, where At = (A’A)~'A’. Further, A* is the 
left inverse of A: ATA ='L. 


The second sentence of the theorem is easily verified: ATA = 
(A7A)~'(A7A) = I, since we are multiplying A’A times its inverse. The 
identity, ATA = I, can be used to check that you have computed A* 
correctly. 

If A is an invertible n-by-n matrix, the pseudoinverse A* equals the 
regular inverse A~' (see Exercise 22). If b happens to lie in the range of 
A, then Aw will the exact solution, that is, Aw equals b. 

Although (12) is complex, the fact that such a matrix A~ exists at all 
is impressive. Applying Theorem 2 to the general regression model, we 
obtain 


Corollary. Consider the regression model y = q,X, + goX, + *** + 
q,X, + r with associated matrix equation 


y = Xq 


where y is the set of y-value observations, X is the matrix whose jth 
column is the set of x,-value observations and whose last column is 


the 1’s vector, and q = [q;, qo, .-.-. 4,, r]. Then the regression 
model parameters q are given by 


q = (X‘X)-'X"y (13) 
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Example 3. Least-Squares Solution to 
Refinery Problem 
Let us find the least-squares solution to the system of equations we 


had in Example 1, where the first two refineries alone had to try to 
satisfy the demand vector. 


20x, + 4x, 500 
10x, + 14x, 850 (14) 
5x, + 5x, = 1000 


If A is the coefficient matrix in (14), then we compute A’A and 
(A’A)~' to be [recall that entry (i, j) in A7A is the scalar product of 
columns / and j of A]. 


S25. 245 


ATA = 
bs 237 


| uk taPay' -| 00368 gp iot 


— .00380 00815 


The pseudoinverse A* of A is 


A+ = (aTAy-tar = | 00368 -00380][20 10 5 
~ 00380 00815]. 4 14 5] Gy 


_ [| .0584 -.0164 —.0006 
~.0435 0761 0217 


With (15), we can now find the least-squares solution w to (14): 


w= A‘tb 


5 
0584 —.0164 el ri 


— .0435 .O761 .O217 on 
1000 


— [14.6 (16) 
1 64.7 


This solution produces the following approximating output vector: 
Aw = [551, 1051, 394] 
with an error vector of 


b — Aw = [500 — 551, 850 — 1051, 1000 — 394] 
= [-—51, —201, 606] 


This is a terrible approximation. We vastly underproduce the 
third product (gasoline). With the third refinery shut down, we shall 
always get much more of the second product than the third product 
[see (14)]. B 
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Example 4. Solution to Regression Model 
yraqxtr 
Let us use the pseudoinverse to solve our regression problem with 


points (0, 1), (2, 1), (4, 4) and the model y = gx + r. The system 
of equations is 


Og+tr=1 
2q+r=1 or Xq = y, 
4qgq+r=4 
Q | l 
where X = 12 11, y=]11, a= [4] (17) 
4] 4 : 


Then 


~ (x* x)= IyT 


lI l| 

a 

| 
cole 
Cie 
a 
———— 
—- © 
— 
— 
Es 

, a, 

os 

ore 

— 


Then 


So g = .75, r = .5; this is the same answer that we obtained for this 
problem in Section 4.2 (see Figure 5.4a). Our regression estimates for 
the y-values are given by Xq = [.5, 2, 3.5]. s 


Example 5. Least-Squares Polynomial Fitting 


Suppose that we want to try to fit a quadratic curve through the set of 
points (0, 7), (1, 5), (2, 4), (3, 4), (4, 8), and noe 12) using a least- 
Squares approximation. Our model is 


y=ax*?+bx+c (19) 
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We shall treat x? as a separate variable, say let z = x2, so that 
a linear multivariate regression model can be used 


y=az+ bx +c (20) 
For the given set of points our X matrix is 
Yo gf 7 
ee a 5 
‘3 4 Zs 
= ith = d = 2 
X ae with y 4 and q b (21) 
16 4 1 g ; 
23. 2. °% 12 


Using a computer program, we obtain 


2 4 —4 =i 5 
X* = (X’X)" 'X7 = 56 =39 .2 184 210 95° = Ty 
46 18 0 —8 =6 6 


and 

i pe 

.893 ss 

q = X*y =| —3.493| with estimates y = Xq = ee 
7.214 

oe 

12.1 

(22) 


Our quadratic estimate is thus ¥ = .893x? — 3.493x + 7.214. 
Although the estimated y-values work out closely to the observed 
y-values, a word of warning is important. This is a very poorly con- 
ditioned problem—the columns of X are all fairly similar. In fact, the 
condition number of the matrix (X’7X) is 2000 (in the sum norm)! A 
small change in the data could produce a large change in our answer. 
ea 


To compute the pseudoinverse (A’A)~'A’, we need to know that the 


matrix A’A is invertible. The following result gives us the information we 
need. 


Theorem 4. Let A be an m-by-n matrix. Then the n-by-n matrix A’A is 
invertible (and the pseudoinverse A* exists) if the m columns of A are 
linearly independent, or equivalently, if rank(A) = n. 
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Proof (Optional). We shall prove the stronger result that 
rank(A’7A) = rank(A). Since rank(A) = n, then rank(A7A) = n and 
any -by-n matrix of rank 7 is invertible. 

We work with null spaces. By Corollary A to Theorem 3 of 
Section 5.2, for a matrix B with n columns, dim(Null(B)) = n — 
rank(B). Then rank(A’7A) = rank(A) is a consequence of showing that 
Null(A7A) = Null(A). 

Let x be any vector in Null(A), that is, Ax = 0. Then 


(ATA)x = A™(Ax) = A"(0) = 0 


so x is in Null(A7A). 

Conversely, let y be any vector in Null(A7A). We want to show 
that y is in Null(A). To do this, we show that (Ay) * (Ay) = 0, which 
implies that Ay = 0 (since c- c = 0 means = c? = 0). We need the 
fact that (Ay) » (Ay) is the same as (Ay)’(Ay) (the latter is the product 


of the l-by-m matrix (Ay)! times the m-by-1 matrix Ay). Then we 
have 


(Ay) « (Ay) = (Ay)"(Ay) = (y’7A7)(Ay) 
=e ary e= ee 


We note that in regression problems, practical considerations dictate 
that the matrix X is virtually certain to have linearly independent columns. 

There is an important special case in which the computation of the 
pseudoinverse becomes very easy. This is when the columns of the matrix 
A are orthogonal. Now by Theorem 1, the scalar product of columns 
af - af equals 0. Since entry (i, j) in A’A is exactly this scalar product, 
A’A will be all 0’s except on the main diagonal. This simple form of A7A 
leads to a simple form for (A7A)~' and for A*. 


Example 6. Regression with Orthogonal Columns 


We shall repeat the analysis of Example 5 with points (0, 1), (2, 1), 
and (4, 4) and regression model y = gx + r, but we shall shift the 
x-values so that the average x-value is 0 (in Section 4.2 we noted that 
such a shift simplified our regression formulas). The average x-value 
is (0 + 2 + 4)/3 = 2. If we subtract 2 from each x-value, obtaining 
points (— 2, 1), (O, 1), and (2, 4), then the new average x-value is 0 
(subtracting off the average value always makes the new average 
be 0). 


Let us repeat the pseudoinverse computations of Example 5 for 
these new points. 


—-2g+r= 
Og +r 
2qtr= 


. oe 

> —_ 
© 
Lom } 
ve 
= 
| 
Me, 
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—-2 1 | 
where X = 0 fl, wei (23) 
> 6a 4 


Observe that the two columns of X are now orthogonal. The 
scalar product of the two columns x- 1 = 2 x; is 0, since the average 
x-value (= = (x,)/m) is 0. Then 


TC = Ty)\-1 — 0 
xX’ X | { and (X’ X) » \ (24) 


The inverse (X7X)~' in (24) can be computed by the determinant 
formula, but we note that the inverse of a diagonal matrix D (with O's 
everywhere off the main diagonal) is obtained by replacing each di- 
agonal entry with its inverse, as in (24). 

The two diagonal entries in X’X are, in symbolic terms, x - x 
and 1 - 1 = m (number of points in regression problem). Thus, when 
the average x-value is 0, (X’X)~' has the form 


fe =) 


oak hae 
(X7X)~! —_— | oa” & (25) 
0 I1/m 


The pseudoinverse X* is now 


a, ee Te ae 0 SF Lae og | 
siete | bal 


When we premultiply any matrix B by a diagonal matrix D, then D 
has the effect of multiplying the ith row of B by the ith diagonal entry 
of D, as in (26). 

Looking at the values of the diagonal entries in (X’X)~' [see 
(25)], we see that X* is simply the transpose of X with the first column 
of X divided by its sum of squares (x - x) and the second column 
divided by m (the number of points). 

Finally, 


“Eee hE 


4 


Observe that q is the scalar product of the first row of X* with y. But 
we just noted that the first row of X* is simply x divided by the number 
x - x. Similarly, r equals the scalar product of the second row of X~* 


times y, and this second row is just (1/m)1. Thus we have the simple 
formulas 
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q = oes r= Er (= average y-value) wal 


By Theorem 2, q and r are simply the projections of y onto x and 1, 
respectively. The nice results obtained in Example 6 will be true for the 
pseudoinverse of any matrix with orgthogonal columns. 


Theorem 5. If the m-by-n matrix A (m > n) has orthogonal columns, the 
pseudoinverse A* is obtained by dividing each column of A by the 
sum of the squares of the column’s entries and then taking the transpose 
of the resulting matrix; the ith row of A* is a€/(a® - a©). Further, the 
least-squares solution w = A‘b is just the projection of b onto the 
columns of A: w; = af + b/a® - af. 


Suppose that we have a regression model with several input variables, 
such as 


Y=qut+qvt+tqxtr (28) 


and suppose that the vectors u, v, x, and 1 (of the uw-values, v-values, 
x-values, and |’s vector) are orthogonal. Then Theorem 5 tells us, gener- 
alizing (27), that the regression parameters are the projections of y onto u, 
v, x, and I: 


, : l 
AS or Gh as ee ee 


But what chance is there that the u, v, x, and 1 vectors will be or- 
thogonal? The answer is often up to the person who collects the data. If the 
u-, v-, and x-values measure settings of control knobs on a complex machine 
and the y-value measures the task performed by the machine, then a re- 
searcher who knows about Theorem 5 could pick settings to make the vectors 
orthogonal. This is a problem in a statistical subject called design of exper- 
iments. 

We asserted in Theorem | that if a and b are orthogonal, that is, they 
form an angle of 90°, then a+ b = Q. This result was central to the derivation 
of the pseudoinverse. Now we prove the following theorem about the angle 
between two vectors, and Theorem | follows directly from this result. We 
measure the angle between two vectors a, b by treating the vectors as line 
segments from the origin to the points with coordinates given by a and b 
(see Figure 5.5). 


Theorem 6. The cosine of the angle 8 between any vectors a, b is 


a:b 
cos 8 = ——; (30) 
lal |b 


If a and b are unit-length vectors, (30) becomes cos 8 = a-° Db. 
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Figure 5.5 86 is the angle 
between a = [3, 4] and 
= [12, 5]. 


BMT TTL 
Example 7. Examples of Angles Between Vectors 


(i) Ifa = [1, 0, 0] and b = [0, 1, O], then cos 8 = a-b = O and 
we conclude that a, b form a 90° angle, that is, they are orthog- 
onal. 

(ii) If a = [3, 4] and b = [12, 5], then Ja] = 5 (= V9 + 16) 
and |b] = 13 (= V144 + 25). So cos 8 = a>: b/|al |b] = 
(3-12 + 4-5)/5 - 13 = 38 = .86. The angle with a cosine of .86 
is 36° (see Figure 5.5). 

(iii) If a = [.6, .8], with Jal = 1 and b = [I, OJ, then cos @ = 
a:b = .6:1 + .8-0 = .6—just the first coordinate. z 


The proof of Theorem 6 uses the law of cosines: 
la — bi? = |al* + |b|? — 2\al |b] cos 0 (31) 


The square of the euclidean norm |c|* is simply ¢ « c, and we can write 
la — bj)? as (a — b)- (a — b). Expanding with matrix algebra, we have 


la — bl? = (a — b)- (a — b) 


a-ak Deb —2a°e (32) 
= jal? + [bl — 2a-b 


The right side of (32) is the same as the right side of (31) except for the last 
terms. So these last terms must be equal: 


—2\al |b] cos 8 = —2a-b 


Solving for cos 6 yields Theorem 6. 

Theorem 6 has a very important application in statistics. The cosine 
of the angle 6(x, y) between two vectors x and y tells us if the vectors are 
close together [when cos 6(x, y) is near |] or opposites of one another [when 
cos 6(x, y) is near —1], or are unrelated, that is, close to orthogonal 
[when cos 6(x, y) is near Q]. 

Suppose that x and y are vectors of data from an experiment, say x is 
the scores of 10 students on a math test and y is the scores of the 10 students 
on a language test. Then cos 6(x, y) tells us how closely related these two 
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sets of data are and helps us predict future relations between math and 
language scores. If cos @(x, y) is .8, performance on these two tests is 
closely related and we can view a student’s score on one test as a reasonably 
good predictor of how he or she will do on the other test. If cos @(x, y) is 
—.7, the score vectors point in almost opposite directions and a high score 
on one test is very likely to produce a below-average score on the other test. 
If cos 0(x, y) = O (the vectors x, y are orthogonal), then performance on 
one test tells us nothing about the likely performance on the other test (in 
statistics, one says that the two sets of data are independent). 


Definition. Let x = [x,, x, ...,x,] andy = [y,, yx,..., y,] be two 
sets of observations with the property that the average x-value and 
the average y-value are each 0. Then the correlation coefficient 
Cor(x, y) of x and y is defined to be cos 0(x, y). 


cot Co eee 
Cor(x, y) = . Bil ae eves (33) 


Recall that the average x-value is x = (1/n) = x,. If x # 0, we can 
subtract x from each x; to get a revised vector that does have an average 
value of 0. Similarly for y-values. We need an average value of 0 so that 
the opposite of a high score (a positive value) will be a low score (a negative 
value). This way the terms x;y; in (33) for pairs of oppositely correlated 
entries x;, y; will be negative (when x,y, is the product of a positive and a 
negative number), leading to a negative correlation. 


Example 8. Correlation Coefficient 


Suppose that we ask the eight faculty members of the Podunk Uni- 
versity Alchemy Department to rate the quality of their graduate stu- 
dents and we poll the students to get a rating of the quality of each of 
the eight. The results of our experiment are presented in Table 5.1 
(where we have processed the data to make the average value 0 in each 


category). 

Table 5.1 

Faculty Quality of Students (x;) Student Rating (y;) 
1. Aristotle +5 +2 
2. Galileo —5 ma | 
3. Goldbrick —2 0 
4. Hasbeen +3 —] 
5. Leadbottom —4 =3 
6. Merlin +5 +3 
7. Midas +5 —0 
8. Santa Claus —7 +8 
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Applying formula (33) to these data, we obtain 


ba [xy | 5:2 +(-S5(-7) +--+: + (-798 
VSa VE ye VI178 - V'152 
a Ree 
7 4355s 


Looking back at the data in Table 5.1, we are a little surprised 
to see such a low correlation, since the numbers in the two columns 
correspond fairly well for most faculty with the glaring exception of 
Santa Claus. Statisticians would call Santa Claus’s data pair (—7, 8) 
an outlier, an observation that fits poorly with the rest of the data. We 
warned in the regression section (Section 4.2) that one or two outliers 
can distort a statistical analysis. (A little investigating reveals that Santa 
Claus is a terrible teacher but is still well liked because he gives the 
students lots of candy every December.) 

Let us throw out Santa Claus’s numbers and recompute the cor- 
relation coefficient. This requires us to adjust the data so that the 
averages in each column are again 0. The new numbers are shown in 
Table 5.2. 


Table 5.2 


Reta - Son he : 
‘ esos St ed a See AR . 


Students’ Rating (y;) 


: = 
o acks 


Faculty Quality of Students (x;) 


1. Aristotle +4 +3 
2. Galileo —6 =—6 
3. Goldbrick ~s l 
4. Hasbeen +2 0 
5. Leadbottom = —4 
6. Merlin +4 +4 
7. Midas +4 +] 
85 85 


a high degree of correlation. ra) 


We finish this section with a discussion of orthogonal vector spaces 
associated with least-squares solutions. A least-squares solution w to 
Ax = b involves breaking b into two parts, 


b=p+b-—p | (34) 


where p = Aw is the least-squares solution and b — p is the error vector. 
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Recall that p and b — p must be orthogonal. The decomposition of b in 
(34) into a range and an error vector is unique (p is the unique vector Aw). 
We assume here that A has linearly independent columns. 

The vector p is in the range of A (= column space of A). Now we 
shall identify the vector space V containing b — p. V is the error space of 
A consisting of all error vectors; these are vectors e orthogonal to the col- 
umns of A. This means that af +e = 0 for all columns af of A. 

Earlier in this section we expressed the fact that the error vector 
b — Aw was orthogonal to all the columns of A as A’7(b — Aw) = 0. That 
is, A’ has rows that are the columns of A. Then the error space V can be 
defined: 


V = {v: A’v = 0} (35) 

But from (35), we see that V is simply the null space Null(A’) of A’. Thus 

Range(A) is orthogonal to Null(A’). Here we are calling two vector spaces 
orthogonal if all pairs of vectors, one from each, are orthogonal. 

Let us next determine the dimension of the error space [= Null(A‘)}. 


By Theorem 3 of Section 5.2, the dimension of the Null(A’) is m — 
rank(A’) (where m is the number of columns in A’). So 


dim(Error space(A)) = m — rank(A’) (36) 


But rank(A’) = rank(A) (this simple consequence of Theorem 4 of 
Section 5.2 is proved in Exercise 18 of that section). So (36) is the same as 


dim(Error space(A)) = m — rank(A) (37) 


Using the fact that the dimension of Range(A) is rank(A), we have the 
expected result [in light of (34)]: 


dim(Range(A)) + dim(Error space(A)) = m (38) 
Summarizing, we have 
Theorem 7 
(1) Let A be a m-by-n matrix and b be any m-vector. Then b can be 
written aS a unique sum 


b = b, + b, (39) 


where b, is in Range(A), b, is in Error space(A), and b,, b, are 
orthogonal. Further, 


dim(Range(A)) + dim(Error space(A)) = m 


The vector b, in (39) equals Aw and ts the projection of b onto 
the column space of A; w is the least-squares solution w = A”b. 
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(ii) The error space of A equals Null(A’), so Null(A’) is orthogonal 
to Range(A). 


Theorem 7 is valid even if A does not have linearly independent columns. 
Recall that the Range(A) equals Col(A), the column space of A, which 
equals Row(A’). So Theorem 7, part (ii) implies that Null(A’) is orthogonal 
to Row(A’), or, interchanging the names of A and A’, Null(A) is orthogonal 
to Row(A). Reinterpreting (39) in terms of these vector spaces, we have 


Theorem 8. Let A be an m-by-n matrix. The row space of A and the null 
space of A are orthogonal. Further, any n-vector x can be written as 
a unique sum 


x =x, + X, (40) 
where x, is in the Row(A) and x, ts in Null(A). 


Note that Null(A) being orthogonal to Row(A) follows directly from 
the definition of Null(A): x is in Null(A) when ax = 0, but Ax = 0 just 
says that x is orthogonal to the rows of A. However, the unique sum (40) 
is not so obvious. 

The reader should recall Theorem | of Section 5.1, which asserted that 
any solution x’ to Ax = b could be expressed as x’ = x* + x°, where x* 
is some particular solution to Ax = b and x° is some solution to the ho- 
mogeneous system Ax = 0 [i.e., x° is in Null(A)]. Theorem 8 tells us that 
the decomposition of x’ can be chosen so that x* is in the row space of A. 


Corollary A. Let x’ be a solution to the system Ax = b. Then x’ can be 
uniquely decomposed x’ = x* + x°, where x* is in the row space of 
A and x? is in the null space of A. Further, x* and x° are orthogonal. 


Note in Corollary A that if Ax’ = b, then Ax* + Ax® = b (since 
x’ = x* + x®). But Ax? = 0 since x® is in Null(A), so Ax* = b. We 
have thus proved the surprising result: 


Corollary B. If the system Ax = b has a solution, it has a solution x* that 
lies in the row space of A. 


Section 5.3 Exercises 


Summary of Exercises 

Exercises 1-16 involve regression and least-squares solutions; when asked 
if a 2-by-2 matrix is poorly conditioned, say yes if the condition number is 
= 10. Exercises 17-21 involve angles between vectors and the correlation 
coefficient. Exercises 22-30 involve examples and extensions of vector- 
space theory. 
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1. Determine the condition number of the matrix (A’A) in Example 3. Is 


this matrix poorly conditioned? 


. Seven students earned the following scores on a test after studying the 


subject matter different numbers of weeks. 


Student A wee ES OE ae OG 


Length of Study (x,) 


Test Score (y;) So me =F ee Be PB 8S 


Fit these data with a regression model of the form y = gx + r. De- 
termine g and r by computing the pseudoinverse of X, the matrix whose 
first column is the vector of x,’s and whose second column is a l’s 
vector. Plot the observed scores and the predicted scores. 


. The following data indicate the numbers of accidents bus drivers had 


in One year as a function of the numbers of years on the job. 


Years on Job (x;) 


Accidents (y,) ix .-3. ay 38. oe Ss 


(a) Fit these data with a regression model of the form y = gx + r. 
Determine g and r by computing the pseudoinverse of X, the matrix 
whose first column is the vector of x;’s and whose second column 
is a 1's vector. 

(b) What is the condition number of the matrix (X‘X)? Is the problem 
poorly conditioned? 

(c) Repeat the calculations in part (a) by first shifting the x-values to 
make the average x-value be 0 (see Example 6). 


. The following data shows the GPA and the job salary (five years after 


graduation) of six mathematics majors from Podunk U. 


Salary 25,000 38,000 28,000 35,000 30,000 32,000 


(a) Fit these data with a regression model of the form y = gx + r 
using pseudoinverses. 

(b) What is the condition number of the matrix (X7X)? Is the problem 
poorly conditioned? 

(c) Repeat the calculations in part (a) by first shifting the x-values to 
make the average x-value be 0 (see Example 6). 
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5. Compute the pseudoinverse, and then solve the refinery problem in 
Example | when refinery | is shut down (the other two refineries op- 
erate). 


6. (a) Compute the pseudoinverse, and then solve, the refinery problem 
in Example | when refinery 2 is shut down (the other two refineries 
operate). 

(b) Compute the error vector e = b — Aw for part (a). Compute the 
angle between the error vector e in part (a) and the solution vector 
Aw. It should be about 90°. Is it? . 

(c) Which refinery closing, of the three refineries, has the smallest error 
vector (in the sum norm)—this assumes that you have done Exer- 
cise 5. 


7. Compute the pseudoinverse of the following matrices. 


ws 
- 4 (b) | 2 (} } 2 =1 
3 ‘7 Gh 
Sie tin ad ye ES, 
lw a’ # To oo 
ca a om a (i Mee 
epobsk 2 iy oie al 


8. In each case, find the linear combination of the first two vectors that is 
as close as possible to the third vector. 
(a) fb, 2, 14, f2,'6,. =k ts, —4..4 
(b) [1, 0, 1], [0, 1, 1]; [0, 0, 5] 
(c) (0, —2, 3}, [1, 1, 1]; 11, —5, 10] 
(d) (2, 0, 1], [—1, 0, 1]; [4, 3, 2] 
fe) f0..1,.1, 0], 1, —1, -14.1} 22,9, 2, 6 


9. (a) Factory A produces 30 cars, 40 light trucks, and 20 heavy trucks 
per day, while factory B produces 60 cars, 20 light trucks, and 20 
heavy trucks a day. If the monthly demand is 1000 cars, 500 light 
trucks, and 400 heavy trucks, what is the least-squares solution 
(days of production for each factory)? 

(b) If the monthly demand increased by 10 cars, how much longer 
would factory A have to work each month? 
Hint: See Example 4 of Section 3.3. 

(c) If the monthly demand increased by 10 light trucks and 5 heavy 
trucks, how much longer would factory B have to work each 
month? 


10. (a) Bureaucratic office A produces 40 new regulations, inspects 90 
defective appliances, and approves 300 applications a week. Bu- 
reaucratic office B produces 80 new regulations, inspects 40 defec- 
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11. 


12. 


13. 


tive appliances, and approves 200 applications a week. How many 
weeks would each office have to work in order to best approximate 
(in the least-squares sense) a demand of producing 1000 new reg- 
ulations, inspecting 700 defective appliances, and approving 2000 
applications. 

(b) What is the condition number of the matrix (A’A) in the pseu- 
doinverse computations? Is this problem poorly conditioned? 

(c) If the demand for new regulations increased by 10, how much 
longer would office A have to work? 


Consider the regression problem in which high ‘school GPA and total 
SAT score (verbal plus math) are used to predict a person’s college 
GPA: 


total SAT 
GPA college = q,(GPA hi sch) + q, (essary 


Suppose that our data for five people are as follows: 


GPA.,, GPA, SAT 


moat > 


(a) Compute the pseudoinverse (X7X)~'X*. In the process, determine 
the condition number of (X’X). Is this problem poorly conditioned? 

(b) Determine g,, q>, and r. 

(c) Determine the error vector e (differences between true GPA-college 
and estimated GPA-college). Is it orthogonal to the estimated GPA- 
college vector? 


In Example 5, re-solve the quadratic least-squares approximation prob- 

lem for the following data points. Note that for parts (a) and (b) the 

x-values are the same, so the pseudoinverse will be the same (just use 

the X* in the text). 

(a) Same as in Example 5 but the fourth point is (3, 5). 

(b) Same as in Example 5 but the first point is (0, 9). 

(c) (0, 7), (1, 5), (2, 7), (3, 9), (4, 13) 

(d) (—2, 7), (—1, 5), (0, 4), Cl, 4), (2, 8), (3, 12); how is this problem 
related to the original problem in Example 5? 

(e) (0, 2), (1, 4), (2, 10) 


Fit a cubic polynomial to the following data points using the same idea 
as in the quadratic fit in Example 5: (—1, —2), (0, 3), (1, 2), (2, 8), 
(3, 12), (4, 100). What is the condition number of (X7X)? 
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14. Consider the regression model Z = gx + ry + s for the following data, 


15. 


16. 


17. 


18. 


19. 


where the x-value is a scaled score (to have average value of 0) of high 
school grades, the y-value is a scaled score of SAT scores, and the 
z-value is a scaled score of college grades. 


Determine g, r, and s. Note that the x, y, and 1 vectors (in the regression 
matrix equation z = gx + ry + sl) are orthogonal. 


Consider the regression model y; = qgx;,i = 1, 2,...,n. Compute 
the pseudoinverse for this regression problem and solve for g (in terms 
of the x;, y,; values). As a matrix system Xq = y, the matrix X is the 
n-by-| column vector of x-values and y is the vector of y-values. Your 
answer should agree with the formula for g in equation (7). 


Use a geometric picture to explain why if Aw is very close to b, then 
the error vector b — Aw may not be exactly orthogonal to the projection 
vector Aw in a least-squares solution to Ax = b. 


Compute the cosine of angle, and determine the angle, made by the 
following pairs of vectors. 

(a) [1, O}, [1, 1) (b) [3, 4], [-3, 4] (c) [1, 2], (3, 1] 
(d) [1, 0, 1], [0, 1, 0] (e): TE. ts. mb] 

ay 1a. t, Ra 8s 


Compute the correlation coefficient between the vectors of x- and 
y-values in Exercise 14, and between the vectors of x- and z-values in 
Exercise 14. 

The following data show scores that three students received on a battery 


of six different tests. 


Gerry Jimmie Ronnie 


General IQ 12 20 10 
Mathematics 8 .22 4 
Reading 16 14 10 
Running 24 16 12 
Speaking 12 10 30 


Watching 12 14 8 
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20. 


21. 


22. 


23. 


25. 


Compute the correlation coefficient between 
(a) Gerry and Jimmie (b) Gerry and Ronnie 
(c) Jimmie and Ronnie 


Hint: Remember first to subtract the average value from each number. 


(a) Compute the correlation coefficient of the following readings from 
eight students of their IQs and scores at Zaxxon. 


Student A B G D E F G H 


Zaxxon 11,000 7,000 10,000 12,000 8,000 100,000 8,000 6,000 


(b) Delete student F and recompute the correlation coefficient. 


Hint: Remember first to subtract the average value from each 
number. 


Suppose that there are two dials A and B on a machine that produces 
steel. We want to find out how settings a,, b; of the two dials affect the 
quality c; of the steel. We use a regression model € = pa + gb + r. 
For each of the following vectors a of settings for dial A, find a vector 
b of settings of dial B that is orthogonal to the dial A vector and also 
orthogonal to the 1’s vector. 

(a) a = (2, 1,0, -—1, —2] (b) a = [—4, —1, 0, 2, 3] 

(c)' a = [2,6, 1, —4, —3, -—2] 


Prove that the pseudoinverse A* equals the true inverse A~' if the 
n-by-n matrix A is invertible (and hence has rank n). 


(a) Show that in the euclidean norm |a — b| = |al| + |b] by squaring 
both sides of this inequality and using the law of cosines (on the 
left side). 

(b) Show that in the euclidean norm |a + c| < |al + |e] by letting 
b = —c (and hence —b = c) and using part (a). 


. Show that if a is orthogonal to b and c, then a cannot be linearly 


dependent on b and c. 


If the vectors in a basis of vector space V are mutually orthogonal to 
the vectors in a basis of vector space W, show that every vector in V 
is orthogonal to every vector in W. 


. For each of the following matrices A, express the 1’s vector 1 (of the 


appropriate length) as the unique sum of two vectors, 1 = b, + by, 
such that b, is in Range(A) and b, is in the error space of A. This 
unique sum exists by Theorem 7, part (i). 
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27. 


28. 


29. 


ft 1 1 0 
(@) |, (b) | 2 oy 12 =1 
3 eng 
-. th 4 0 i 
tn eB ire a 
d 
ck a ae ee il it See 
— a ae aT ae oe 


For each matrix A in Exercise 26, find a basis {v,} for the Range(A) 
[= Col(A)] and a basis {w,} for Null(A’). Then verify that the v, are 
orthogonal to the w,, as required by Theorem 7, part (ii). 


For each of the following matrices A, express the I’s vector 1 as a 
unique sum, 1 = x, + x, of a vector x, in Row(A) and a vector x, in 
Null(A). 


Bes ee A hee yt 6 
ON cg, ap Py a) EE le vocal eae £3 
Oi 4 


Find a solution to Ax = 1, for A the matrix in Exercise 28, part (c), 
in which x is in Row(A). 


Use Theorem 8 to prove that if v,, Vv, ..., ¥, are a linearly inde- 
pendent set of vectors in the row space of a matrix A, then w, = Av, 
are a linearly independent set of vectors in the range of A. Thus, if {v,} 
are a basis for Row(A), then {Av,} are a basis for Col(A). 


456 


Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems 


The inverse A~' of a matrix A with orthogonal columns a is easy to 
describe. It is essentially the same as the pseudoinverse: A~' is formed by 
dividing each column af by a - af, the sum of the squares of its entries, 
and forming the transpose of the resulting matrix. Thus, if s; = a©- a, 
then 


dla (1) 


ae) 


We verify (1) by noting that entry (i, 7) in A” 'A will be 0 if i ¥ j be- 
cause a©:a® = O (the columns are orthogonal). Entry (i, i) equals 


(af /s,)° al = af -aS/(at- at) = 1. 


Example 1. Inverse of Matrix with 

Orthogonal Columns 
oe 
4 3 
gonal. The sum of the squares of the entries in each column of A 


is 3* + 4° = 25. If we divide each column by 25 and take the 
transpose, we obtain 


The reader should check that this matrix is exactly what one would 
get by computing this 2-by-2 inverse using elimination. 
(11) Consider the orthogonal-column matrix 


(1) Consider the matrix A = | whose columns are otho- 


th i) 
Se Pe 


2 l 0 

A=j]1 -!l | 

Lo =) J 

Its inverse, by (1), is 

2 1 1 
6 6 6 
A-t=|$ -4 -3 
Cor Brag 


Again the reader should check that A~'A = I. a 
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Let use use (1) to obtain a formula for the ith component x; in the 
solution x to Ax = b. Given the inverse A~', we can find x asx = A~'b. 
The ith component in A~'b is the scalar product of ith row of A~' with b. 
By (1), the ith row of A~' is a€/(a® - a©) and thus 


Xx; - Cc". qc (2) 


Our old friend, the length of the projection of b onto column af (see Theo- 
rem 2 of Section 5.3). 

A set of orthogonal vectors of unit length (whose norm is 1) are called 
orthonormal. The preceding formulas for x; and A~' become even nicer if 
the columns of A are orthonormal. In this case, a© -a© = 1. Then the 
denominator in (2) is 1, so now the projection formula is x; = a© - b. To 
obtain A~', we divide each column of A by | and form the transpose: that 
is, A~' = A’. Summarizing this discussion, we have 


Theorem I 

(i) If A is an n-by-n matrix whose columns are orthogonal, then A7' 
is obtained by dividing the ith column of A by the sum of the 
squares of its entries and transposing the resulting matrix [see (1)]. 
The ith component x; in the solution of Ax = b is the length of 
the projection of b on af: x; = a© + b/aS- af. 

(ii) If the columns of A are orthonormal, then the inverse A~! is A’ 
and the length of the projection is just x, = a€ - b. 


Suppose that we have a basis of n orthogonal vectors q; for n-space. 
If Q has the q; as its columns, the solution x = b* of Qx = b will be a 
vector b* of lengths of the projections of b onto each q;: 


Qb* = big, + b5q, + °-: + beg, = bd (3) 
Here the term b7q, is just the projection of b onto q,. So (3) simply says 


Corollary. Any n-vector b can be expressed as the sum of the projections 
of b onto a set of n orthogonal vectors q,. 


Example 2. Conversion of Coordinates from 
One Basis to Another 


Consider the orthonormal basis gq, = [.8, .6], q, = [—.6, .8] for 
2-space. To express the vector b = [1, 2] in terms of q,, q, coordi- 
nates, we need to solve the system 


steel a} Le 
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Figure 5.6 |-.6, .8] -axis 8. .6]-axis 


(3) i “3 A ‘ 5 


or 


oe] +er Fs]-L] 


by = 2, by = | are projections 


of [3 J onto[ 8] ana [~§] 


[1,2] 


or 
Qb* = b, where Q = [q, q] 
By Theorem 1, 
by = q,'‘b 
bz = q° 


8X1 + 6X2 = 2, 
— 6X1+ .8xX2 = 1 


i 
| 


where b¥q, = 2[.8, .6] is the projection of b on q,, and b5q, = 
[—.6, .8] 1s the projection of b on q,. Thus b = [1, 2] is expressed 
as ane, — e, coordinate vector, while b* = [2, 1] is the same vector 
expressed in q, — q, coordinates. A geometric picture of this con- 
version is given in Figure 5.6, where the vector [2, 1] is depicted as 
the sum of its projection onto q, and onto q,. m 


Theorem | is a carbon copy of Theorem 5 of Section 5.3 about pseu- 
doinverses when columns are orthogonal. As with the inverse, if A’s col- 
umns are orthonormal, the pseudoinverse A* of A will simply be A’. The 
following example gives a familiar illustration of this result and shows why 
orthogonal columns make inverses and pseudoinverse so similar. 


ie ie ena 
Example 3. Pseudoinverse of Matrix with 
Orthonormal Columns 


Let I, be the first two columns of the 3-by-3 identity matrix. 


1 0 
L=|]0 1 
0 0 


Then 


item Of aed bk cae 
; - 010 


For any vector b = [b,, 52, b3], the least-squares solution x = b* to 
Lx = bis 
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b 
ar 1 0 Of] b, 
b* = I{b = b,| = 
0 1 Of), b, 
3 


This result confirms our intuitive notion that [b,, b,, 0] is the closest 
point in the x-y plane to the point [b,, b,, 53). z 


Optional 
There is another interesting geometric fact about orthonormal columns 
(see the Exercises for the two-dimensional case). 


Theorem 2. When Q has orthonormal columns, then solving Qx = b for 
b* = Q’b is equivalent to performing the orthonormal change of basis 
b — b* = Q’b. Such a basis change is simply a rotation of the 
coordinate axes, a reflection through a plane, or a combination of both. 
The entries in Q can be expressed in terms of the sines and cosines of 
the angles of this rotation. 


For example, the rotation of axis in the plane by 0° is a linear trans- 
formation R of 2-space: 


R: x = xcos? + ysnd or u’ = Au 


—x sin 8° + ycos 6° 
where 
9° : Q° 
an — sin 
— sin ®@ cos 6 
It is easy to check that A has orthonormal columns. 


It follows that the distance between a pair of vectors and the angle that 
they form do not change with an orthonormal change of basis. 


(Note: End of optional material.) 


Orthogonal columns have another important advantage besides easy 
formulas. A highly nonorthogonal set of columns—that is, columns that are 
almost parallel—can result in unstable computations. 


Ee ee ae 
Example 4. Nonorthogonal Columns 


Consider the following system of equations: 


Lx, + .15x> = 5 (4) 
lit, + ix =7 
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Let us call the two column vectors in the coefficient matrix of (4): 
u = [1, 1] and v = [.75, 1]. The cosine of their angle is, by Theorem 
6 of Section 5.3, 


u-v 1.75 
cos \0(B, VY) = == = =. 99 5 
ws lullv| VW2- 1.25 ©) 
The angle with cosine of .99 is 8°. Thus u and v are almost parallel 
(almost the same vector). Representing any 2-vector b as a linear 
combination of two vectors that are almost the same is tricky, that is, 
unstable. For example, to solve (4) we must we find weights x,, x, 


such that 
| . 15 4 5 6 
Xj | X2 | = Ne (6) 


The system (4) is the canoe-with-sail system from Section 1.1. We 
already know that calculations with A, the coefficient matrix in (4), 
are very unstable. In Section 3.5 we computed the condition number 
of A to be c(A) = 16. Recall that the condition number c(A) = 
||Al| - |A~‘|] measures how much a relative error in the entries of A (or 
in b) could affect the relative error in x = [x,, x,]; in this case, a 5% 
error in b could cause an error 16 [= c(A)] times greater in x, a 
16 X 5% = 80% error. 


We solved (4) in Section 1.1 and obtained x, = —1, x, = 7. 
If we had solved for b’ = [7, 5], we would have obtained the answer 
x, = 13,x, = —8 (see Figure 5.7 for a picture of this result). Or for 
b” = [6, 6], x, = 6, x, = 0. a 
Figure 5.7 13t, 1] 
}2 Va 
is 
10 
/ 
8 / 
/ 
6 yt 
[7,5] 
4 
4 
P] Pd 
¢ 751.1% 
/ (i]s EP]-L3] 
=6 «wt <3 2 : 6 8 10 12 
ep, e 
7 
/-4 
4 
Ag -6 
Ad 


-8[.75, 1] 
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Reading the results of Example 4 in reverse, we see that when errors 
arise in solving an ill-conditioned system of equations Ax = b (in which A 
has a large condition number), the problem should be that some column 
vector (or a linear combination of them) forms a small angle with another 
column vector—this means that the columns are almost linearly dependent. 
If the columns were close to mutually orthogonal, the system Ax = b would 
be well-conditioned. 


Principle. Let A be an n-by-n matrix with rank(A)= n so that the system of 
equations Ax = b has a unique solution. The solution to Ax = b will 
be more or less stable according to how close or far from orthogonal 
the column vectors of A are. 


Suppose that the columns of the n-by-n matrix A are linearly inde- 
pendent but not orthogonal. We shall show how to find a new n-by-n matrix 
A* of orthonormal columns (orthogonal and unit length) that are linear com- 
binations of the columns of A. 

Our procedure can be applied to any basis a,, a5, ... , a,, of an m- 
dimensional space V and will yield a new basis of m orthonormal vectors q; 
for V (unit-length vectors make calculations especially simple). The proce- 
dure is inductive in the sense that the first k q; will be an orthonormal basis 
for the space V, generated by the first k a; The method is called 
Gram-Schmidt orthogonalization. 

For k = 1, q, should be a multiple of a,. To make q, have norm 1, 
we set q, = a,/|a,|. Next we must construct from a, a second unit vector 
q, orthogonal to q,. We divide a, into two “‘parts’’: the part of a, parallel 
to q, and the part of a, orthogonal (perpendicular) to q, (see Figure 5.8). 
The component of a, in q,’s direction is simply the projection of a, onto 
q,. This projection is sq,, where the length s of the projection is 


a,* q) 
s = — =a,'q (7) 
qi ° 4 — 
since gq, * q, = 1. The rest of a, the vector a, — sq,, is orthogonal 


to the projection sq,, and hence orthogonal to q,. So a, — sq, is the 
orthogonal vector we want for q,. To have unit norm, we set q, = 
(a, — sq,)/|a, — 5qj|. 

Let us show how the procedure works thus far. 


Figure 5.8 Gram—Schmidt A a, = [3,4] 
orthogonalization. 


a> — Sq, 
| a> — Sq) 


a, — 5G; ~ 92 = 
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See 
Example 5. Gram-Schmidt Orthogonalization 
in Two Dimensions 


Suppose that a, = [3, 4] and a, = [2, 1] (see Figure 5.8). We set 


eee Ce ee 
~ ¢) Ss 1s 


We project a, onto q, to get the part of a, parallel to q,. From (7), 
the length of the projection is 


s = a,°q, = (2, 1] - [8,4] = # =2 


and the projection is sq, = 2|2, 4] _ B |. Next we determine the 
other part of a,, the part orthogonal to sq,: 


95°— 8G, = bes 1 B 5] = EF 3] 


Since |[#, —#]| = 1, then 


We extend the previous construction by finding the projections of a, 
onto q, and q,. Then the vector a; — s,q, — s>q>5, which is orthogonal to 
q, and q, should be q,; as before, we divide a, — s,q, — S5q, by its norm 
to make q, unit length. We continue this process to find q,, qs, and so on. 


ae TS 
Example 6. Gram—Schmidt Orthogonalization 
of 3-by-3 Matrix 


Let us perform orthogonalization on the matrix A whose ith column 
we denote by a,. 


Oy. a2 
ASA ee oo (3) 
4 0 5 
First q, = a,/la,| = [0, 3, 4]/5 = [0, 3, 4]. 
The length of the projection a, onto q, is 


s=a-q, =3-0+5-8+0°% =3 (9a) 


So the projection of a, onto q, is 


Sec. 5.4 Orthogonal Systems 463 


Next we compute 


[3, 5,0] — (0, 2, 4] = [3, ¥, —2 | 
V9 + 256/25 + 144/25 = 5. Then 


Agee at Rh bp eee 
* la, — sq,| S245? 135 


We compute the length of the projections of a, onto q, and q,: 


a, — Sq, 


where |a, — sq,| 


S$) = a3°g, = 2-0 + : ie ial 
S> aq” qo = 5 + 35 , ea (9) 
Dae F tee 
ss 5 5 
Then 
sq, = 70, 3, 4] = [0, 4, ¥] 
$59 = 212, 38, —}3| = 3, 35, — 34] 
and 


a, — 5,4, — 5.4 = (2,5, 5] — [0, 4, ¥| — [8 32, — 3] 
ay See: oe Y 
5 B — 35; 3] 
Since computation reveals that ja; — s,q; — s.q,| = 1, then 


G3 = (a — 5)Q,; — 52Q>) = ER if, 3. 


The matrix of these new orthogonal column vectors is 


0 Z : 
3 12 
Q= 5 35 —38 (10) 
4 1 9 
s -3 a 


In keeping with the principle above, the accuracy of this procedure 
depends on how close to and far from orthogonality the columns a; are. If 
a linear combination of some a; forms a small angle with another vector a, 
(this means the matrix A has a large condition number), then the resulting 
q; will have errors, making them not exactly orthogonal. However, more 
stable methods are available using advanced techniques, such as Householder 
transformations. 
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Suppose that the columns of A are not linearly independent. If, say, 
a; is a linear combination of a, and a,, then in the Gram—Schmidt procedure 
the error vector a; — s,q, — S5q>5 with respect to q, and q, will be 0. In 
this case we skip a; and use a, — s,q, — 55q, to define q,. The number of 
vectors q; formed will be the dimension of the column space of A, that is, 
rank(A). 

The effect of the orthogonalization process can be represented by an 
upper triangular matrix R so that one obtains the matrix factorization 


Theorem 3. Any m-by-n matrix A can be factored in the form 
A =QR (11) 


where Q is the m-by-rank(A) matrix with orthonormal columns q; 
obtained by Gram—Schmidt orthogonalization, and R is an upper tri- 
angular matrix of size rank(A)-by-n (described below). 


For i < j, entry r; of R is a; - q;, the projection of a; onto q;. The 
diagonal entries in R are the sizes, before normalization, of the new columns: 
riy = lal, roo = lay — sq,|, 733 = fa; — 514) — 52q)], and so on. 


Example 7. QR Decomposition 
Give the QR decomposition for the matrix A in Example 6. 


G@ 22 

A= {3 5 5 

4 0 5 
The orthonormal matrix Q is given in (10). We form R from the 
information about the sizes of new columns and the projections as 


described in the preceding paragraph. Here r,, = s = 3 in (9a), and 
ri3 = 8; = 7,13 = S, = 2 in (9b). Then 


v : a) | ay, 
QR=|% # -H#|/o 5 2 
$ -# &Jloo1 


Let us compute the second column of QR—multiplying Q by 
rS, the second column of R—and show that the result is a,, the second 
column of A. 


0 3 4 3 
Ors= | %. -38))5 
Ce & 9 0 
5 25 25 (12) 
0 3 3 3 
=3/2/+5] 4/+0] -#]=/5|/=a 
3 — 38 25 0 ro 
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Columns of Q are obtained from linear combinations of the columns 
of A. Reversing this procedure yields the columns of A as linear combina- 
tions of the columns of Q. This reversal is what is accomplished by the 
matrix product QR. Consider the computation in (12). In terms of the col- 
umns q; of Q, (12) is 


3q, + 5q, + 0g, = a, 
or, in terms of R, 


PioG, + T22Q2 = a (13) 
(a, equals its projection onto q, plus its projection onto q,). 


Next consider the formula for q,: 


Pk ek | ee SEES YY (14) 
la, = sq,| l'a2 


since r,, = |a, — sq,| andr,, = s. Solving for a, in (14), we obtain (13) 


a — F124) 
q@ = > Py2G, + ToQ = A 


F592 


The same analysis shows that the jth column in the product QR is just a 
reversal of the orthogonalization steps for finding q,. 

The matrix R is upper triangular because column a; is only involved 
in building columns q;, qg;,,, ....q, of Q. The QR decomposition is the 
column counterpart to the LU decomposition, given in Section 3.2, in which 
the row combinations of Gaussian elimination are reversed to obtain the 
matrix A from its row-reduced matrix U. 

The QR decomposition is used frequently in numerical procedures. 
We use it to find eigenvalues in the appendix to Section 5.5. 

We will sketch one of its most frequent uses, finding the inverse or 
pseudoinverse of an ill-conditioned matrix. If A is an n-by-n matrix with 
linearly independent columns, the decomposition A = QR yields 


6a = (OR) = RO = (15) 


The fact that Q~' = Q’ when Q has orthonormal columns was part of 
Theorem |. Given the QR decomposition of A, (15) says that to get A~', 
we only need to determine R~'. Since R is an upper triangular matrix, its 
inverse is obtained quickly by back substitution (see Exercise 12 of Section 
3.5). When A is very ill-conditioned, one should compute A~'! via (15): 
first, determining the QR decomposition of A, using advanced (more stable) 
variations of the Gram—Schmidt procedure; then determining R~'; and thus 
obtaining A~' = R™'Q’. 

Equation (15) extends to pseudoinverses. That is, if A is an m-by-n 
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matrix with linearly independent columns and m > n, then its pseudoinverse 
A* can be computed as 


A+ =R='Q'’ (16) 


See the Exercises for instructions on how to verify (16) and examples of its 
use. This formula for the pseudoinverse is the standard way pseudoinverses 
are computed in practice. Even if one determines Q and R using the basic 
Gram-—Schmidt procedure given above, the resulting A* from (16) will be 
substantially more accurate than computing A“ using the standard formula 
A*+ = (A‘A)~'!A‘, because the matrix A’A tends to be ill-conditioned. For 
example, in the least-squares polynomial-fitting problem in Example 5 of 
Section 5.4, the condition number of the 3-by-3 matrix X’X was around 
2000! 


Principle. Because of conditioning problems, the pseudoinverse A* of a 
matrix A should be computed by the formula At = R™'Q’, where 
Q and R are the matrices in the QR decomposition of A. 


We now introduce a very different use of orthogonality. Our goal is 
to make a vector space for the set of all continuous functions. To make 
matters a little easier, let us focus on functions that can be expressed as a 
polynomial or infinite series in powers of x, such as x° + 3x* — 4x + 1 
or e* or sin x. 

Recall that the defining property of a vector space V is that if u and v 
are in V, then ru + sv is also in V, for any scalars r, s. Clearly, linear 
combinations of polynomials (or infinite series) are again polynomials (or 
infinite series), so these functions form a vector space. 

For a vector space of functions to be useful, we need a coordinate 
system, that is, a basis of independent functions u,(x) (functions that are not 
linearly dependent on each other) so that any function f(x) can be expressed 
as a linear combination of these basis functions. 


f(x) = fiu,®) + fom) +-°- (17) 


This basis will need to be infinite and the linear combinations of basis 
functions may also be infinite. The best basis would use orthogonal, or even 
better, orthonormal functions. 

To make an orthogonal basis, we first need to extend the definition of 
a scalar, or inner, product ¢c - d of vectors to an inner product of functions. 
The inner product of two functions f(x) and g(x) on the interval [a, b] is 
defined as 


| b 
f(x) - g(x) = | f(x)g(x) dx (18) 


This definition is a natural generalization of the standard inner product c- d 
in that both c+ d and f(x) « g(x) form sums of term-by-term products of the 
respective entities, but in (18) we have a continuous sum, an integral. 
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With an inner product defined, most of the theory and formulas defined 
for vector spaces can be applied to our space of functions. The inner product 
tells us when two vectors c, d are orthogonal (if c- d = Q), and allows us 
to compute coordinates c* of ¢ in an orthonormal basis u,: c* = ¢ + u, (these 
coordinates are just the projections of c onto the u,;). We can now do the 
same calculations for functions with (18). 

The functional equivalent of the euclidean norm is defined by 


b 
f(x)? = f(x) - f@&) = { f(x)* dx (19) 


The counterpart of the sum norm |e|, = > |c,| for vectors is |f(x)|, = 
J \f@)| dx. 

An orthonormal basis for our functions on the interval [a, b] will be 
a set of functions {u,(x)} which are orthogonal—by (18), J u,(x)uj,(x) dx = 
0, for all i # j~and whose norms are 1—by (19), f u,(x)* dx = 1. Given 
such an orthonormal basis {u;(x)}, the coordinates f; of a function f(x) in 
terms of the u,(x) are computed by the projection formula f; = f(x) - u;(x) 
used for n-dimensional orthonormal bases: 


F(x) = [FO) + uy@)Juje) + LF) + unQX)Ju,) + + +> (20) 


How do we find such an orthonormal basis? The first obvious choice 
is the set of powers of x: 1, x, x7, x3, . . . . These are linearly independent; 
that is, x* cannot be expressed as a linear combination of smaller powers of 
x. Unfortunately, there is no interval on which 1, x, and x* are mutually 
orthogonal. On [—1, 1], 1: x = J xdx = Oandx: x* = f x dx = 0, 
but Px? = fix? de = 4 

There are many sets of orthogonal functions that have been developed 
over the years. We shall mention two, Legendre polynomials and Fourier 
trigonometric functions. 

The Gram—Schmidt orthogonalization procedure provides a way to 
build an orthonormal basis out of a basis of linearly independent vectors. 
The calculations in this procedure use inner products, and hence this pro- 
cedure can be applied to the powers of x (which are linearly independent 
but, as we just said, far from orthogonal) to find an orthonormal set of 
polynomials. 

When the interval is [—1, 1], the polynomials obtained by orthogon- 
alization are called Legendre polynomials L,(x). Actually, we shall not 
worry about making their norms equal to 1. As noted above, the functions 
x® = 1 and x are orthogonal on [—1, 1]. So L(x) = 1 and L,(x) = x. 
Also, x* is orthogonal to x but not to 1 on [—1, 1]. We must subtract off 
the projection of x* onto 1: 


1 - x? [ x? dx 
= y* — 1 = ee => | A a 
L(x) = x (+=) = X fl = X 


(21) 


NW | celts 
| 
- 
| 
wl — 


A similar orthogonalization computation shows that L(x) = x — &x. 
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eee See | 
Example 8. Approximating e* by 
Legendre Polynomials 


Let us use the first four Legendre polynomials L(x) = 1, L,(x) = x, 
L(x) = x* — 4, L,(x) = x» — 3x/5 to approximate e* on the interval 
{—1, 1]. We want the first four terms in (20): 


= WoL, + w,L, (x) + woL,(x) + w3L,(x) 


é 
F l ; 3x 
Ey. OES EE, BE © (Waa 


where w, = e* - L(x)/L(x) - LAx) = J eL,(x) dx/J Lx)* dx. For 
example, 


ag e*(x? — 4) dx 
“op foegeontay 3) (x? — 4)? dx 


With a little calculus, we compute the w, to be (approximately) 


2.35 136 


id Pog Thee 1.18, WwW) 5 ee aes 1.10, 
.096 .OO8 

=—= 53, =——= .18 
Wa Sizyg He, «Me Migs 


Then (22) becomes 


l 3 
= 1.18 + 1.10x + .53 (x = 7 + .18 (3 ~ =) (23) 


If we collect like powers of x together on the right side, (23) simplifies 
to 


ex= 1 +x + 53x? + 18x (24) 


Comparing our approximation against the real values of e* at the 
points —1, —.5, 0, .5, 1, we find 
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A pretty good fit. In particular, it is a better fiton [—1, 1] than simply 
using the first terms of the power series for e*, namely, | + x + 
x?/2 + x°/6. The approximation gets more accurate as more Legendre 
polynomials are used. Ww 


Over the interval [0, 27] the trigonometric functions (1/ V1) sin kx 
and (1/V7) cos kx, for k = 1, 2, ..., plus the constant function 
1 /\V 27 are an orthonormal basis. To verify that they are orthogonal requires 
showing that 


Le sin jx * bo 006 tx = 1 [* sin fe cos be de = 0 
7 T T JO 
for all j, k 
Le wets 02 sO mae 3 
Se sin jx» Fa sin kx = + | sin jx sin kx dx = 0 
for all j] #k 


f 
—<= COS * =. 00S 5 =" cos jx cos kx dx = 0 
Vit ; Vir a JO : 
for all |] Ak 
plus showing these trigonometric functions are orthogonal to a constant func- 


tion. To verify that these trigonometric functions have unit length requires 
showing 


ie sin kx - PH i sin kx = re | sin? kx dx = 1 for all k 
qT qT 
l 


| 27 
Were kx - —=cos kx = 1 cos? kx dx = | for all k 
T 


> 


When u3,_ (x) = (1/V 7) sin kx and u,,(x) = (1/V'm) cos kx, k = 
1, 2,....:. and’u,@) = 1/ V 27 in (20), this representation of f(x) is called 
a Fourier series, and the coefficients f(x) « u,(x) in (20) are called Fourier 
coefficients. Using Fourier series, we see that any piecewise continuous 
function can be expressed as a linear combination of sine and cosine waves. 
One important physical interpretation of this fact is that any complex elec- 
trical signal can be expressed as a sum of simple sinusoidal signals. 


p= ee ae 
Example 9. Fourier Series Representation 
of a Jump Function 
Let us determine the Fourier series representation of the discontinuous 


function: f(x) = 1 forO <x = wm and = O for 7m < x S 2m. The 
Fourier coefficients f(x) - ux) in (20) are 
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f(x) + Uy, (x) = f(x): 7 sin kx = A \ sin kx dx 


2 
——— k odd 
titans [—cos kx]j = kV 
kV 
Q k even 


Te (25) 
f(x) + U(x) = f(x) ° ae cos kx = Fp » ©08 kx dx 


l 
kV 4 


[sin kx]§ = 0 


Further, we calculate f(x) - 1/ V29 = V 1/2, so the constant term 
of the Fourier series for this f(x) is (f(x) - uo(x))uo(x) = 3. 

By (25), only the odd sine terms occur. Letting an odd k be 
written as 2n — 1, we obtain the Fourier series. 


l = 2 
f(x) = : + >> Qn —- )Va sin [(Zn — 1)x] (26) 


Figure 5.9 shows the approximation to f(x) obtained when the first 
three sine terms in (26) are used (dashed line) and when the first eight 
sine terms are used. The fit is impressive. Fi 


0 ee "Es 6 Si biedi2 het Sid Ss. 2022.24. 26. 28 30 32134 "36838 
. Se 


Figure 5.9 Dashed lines use first three trigonometric terms in Fourier series for 
f(x). Solid lines use first eight terms. 
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Representing a function in terms of an orthonormal set of functions as 
in (20) has a virtually unlimited number of applications in the physical 
sciences and elsewhere. If one can solve a physical problem for the ortho- 
normal basis functions, then one can typically obtain a solution for any 
function as a linear combination of the solutions for the basis functions. This 
is true for most differential equations associated with electrical circuits, 
vibrating bodies, and so on. Statisticians use Fourier series to analyze time- 
series patterns (see Example 3 of Section 1.5). The study of Fourier series 
is one of the major fields of mathematics. 

We complete our discussion of vector spaces of functions by showing 
how badly conditioned the powers of x are as a basis for representing func- 
tions. Remember that the powers of x, x‘, i = 0, 1, ..., are linearly 
independent. The problem is that they are far from orthogonal. 

Let us consider how we might approximate an arbitrary function f(x) 
as a linear combination of, say, the powers of x up to x: 


f(x) = Wo + Wiyx + Wox? + wax? + wyx* + Wer (27) 
using the continuous version of least-squares theory. If f(x) and the powers 
of x were vectors, not functions, then (27) would have the familiar matrix 
form f = Aw and the approximate solution w would be given by w = A‘f, 
where At = (A’7A)~!A’. 


Let us generalize f = Aw to functions by letting the columns of a 
matrix be functions. We define the functional ‘‘matrix’’ A(x): 


Ate) TT. x. Xe 
Now (27) becomes 
fa) = AQ)w (28) 


To find the approximate solution to (28), we need to compute the 
functional version of the pseudoinverse A(x)*: A(x)* = (A(x)’A(x))~ AQ)? 
and then find the vector w of coefficients in (27): 


w = A(x)’ f(x) = (AQ@)’AQ))~ (AQ@)'f) (29) 


> 


The matrix A(x)’ has x’ as its ith ‘‘row’’, so the matrix product 
A(x)’ A(x) involves computing the inner product of each ‘row’’ of A(x)’ with 
each *“‘column’’ of A(x): 


entry (i, /) in A(x)’ A(x) is x! - x/ (= Jf x'x/ dx) 


Similarly, the matrix-*‘vector’’ product A(x)’ f(x) is the vector of inner prod- 
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ucts x‘ - f(x). The computations are simplest if we use the interval [0, 1]. 
Then entry (i, 7) of A(x)’ A(x) is 


| te pies xititr 7! | 
xi+x/ = | Sh = | he (30) 
0 it g +1 i+ jt 


0 


For example, entry (1, 2) is f xx? dx = f x° dx = 4. Note that we consider 
the constant function 1 (= x) to be the zeroth row of A(x)’. 
Computing all the inner products for A(x)’ A(x) yields 


1.4 & 4.2 4& 

t, so "4 2 6 
a ae ee i 
m ee) 29°93 6 7 
bhtbd 

T = 

A(x) A(x) -- a 2 2 Ff 1 1 (31) 
im ST 8 9 
a Seba, 
aS Rh SG 9 10 
ae LS Se ae 1 
ae ee RD URE 


This matrix is very ill-conditioned since the columns are all similar to 
each other. When the fractions in (31) are expressed to six decimal places, 
such as 3 = .333333, the inverse given by the author’s microcomputer was 
(with entries rounded to integer values) 


Fractions expressed to six decimal places 


(A(x)’A(x))~' = 
17 —116 —47] 1,180 — 1,986 958 
—116 342 7,584 — 34,881 49 ,482 — 22,548 


—47 7,584  —76,499 242,494 —301,846 129,004 

1,180 —34,881 242,494 644,439 723,636 —289,134 

— 1,986 49,482 —301,846 723,636 —747,725 278,975 

958 —22,548 129,004 —289,134 Zeid 97,180 
(32) 


The (absolute) sum of the fifth column in (32) is about 2,000,000. The first 
column in (31) sums to about 2.5. So the condition number of A(x)/A(x), 
in the sum norm, is about 2,000,000 x 2.5 = 5,000,000. Now that is an 
ill-conditioned matrix! 

We rounded fractions to six significant digits, but our condition number 
tells us that without a seventh significant digit, our numbers in (32) could 
be off by 500% error [a relative error of .000001 in A(x)’A(x) could yield 
answers off by a factor of 5 in pseudoinverse calculations]. Thus the numbers 
in (32) are worthless. 

Suppose that we enter the matrix in (31) again, now expressing frac- 
tions to seven decimal places. The new inverse computation yields 
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Fractions expressed to seven decimal places 


(A(x)‘A(x)) "= 
5] — 1,051 6,160 — 1,475 15,419 —5,845 
= 1,051 26,385 = 165,/605 410,749 — 438,029 168,208 


6,160 —165,765 1,079,198 °'—2, 731,939 2;999;103' —1,146,231 
— 1,475 410,749 —2,731,939 1Vil,399 =~ 1,071,190 2,999 546 
15,419 —438,029 £909,105: <— 45,071,190 8,454,598 —3,327,362 
—35,845 168,208 —1,146,281 2,999,546 —3,327,362 1,316,523 


(33) 


We have a totally different matrix. Most of the entries in (33) are about 10 
times larger than corresponding entries in (32). The sum of the fifth column 
in (33) is about 23,000,000. If we use (33), the condition number of 
A(x)’ A(x) is around 56,000,000. Our entries in (33) were rounded to seven 
significant digits, but the condition number says eight significant digits were 
needed. Again our numbers are worthless. To compute the inverse accurately 
would require double-precision computation. 

It is only fair to note that the ill-conditioned matrix (31) is famously 
bad. It is called a 6-by-6 Hilbert matrix {a Hilbert matrix has 1/(i + j + 1) 
in entry (i, /)]. 

Suppose that we used the numbers in (32) for (A(x)’A(x))~! in com- 
puting the pseudoinverse. Let us proceed to calculate A(x)* and then com- 
pute the coefficients in an approximation for a function by a fifth-degree 
polynomial. Let us choose f(x) = e*. Then (A(x)’e*) is the vector of 
inner products x’ - e' = f x'e* dx, i = 0,1, ... , 5. Some calculus yields 
A(x)e* = [2.718, 1, .718, .563, .465, .396] (expressed to three significant 
digits). 

Now inserting our values for (A(x)’A(x))~' and A’e* into (27), we 
obtain 


w = (A(x)’A(x))~ '(A(x)’e*) = 


17 

— Ito 
—47 
1,180 
— 1,986 
958 


~116 ~47 1,180  —1,986 958] [2.718 
342 7,584 —34,881 49,482 —22,548 l 
7,584 —76,499 242,494 -301,846 129,004 718 
~34,881 242,494 644,439 723,636 —289,134| | .563 
49,482 -—301,846 723,636 —747,725 278,975 465 
—22,548 129,004 —289,134 278,975 —97,180 396 
(34) 
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Thus our fifth-degree polynomial approximation of e* on the interval 
[O, 1] 1s 


eX = 17 — 87x — 219x? + 1611x° + 2449x* + 1135x° (35) 


Setting x = 1 in (35), we have e' = 17 — 86 — 219 + 1611 + 2449 + 
1135 = 4907, pretty bad. Since our computed values in (A(x)’A(x))~' are 
meaningless, such a bad approximation of e* was to be expected. 

Compare (35) with the Legendre polynomial approximation in Ex- 
ample 8. 


Section 5.4 Exercises 


Summary of Exercises 

Exercises 1-11 involve inverses, pseudoinverses, and projections for matri- 
ces with orthogonal columns. Exercises 12—21 involve Gram—Schmidt or- 
thogonalization and the QR decomposition. Exercises 22—30 present prob- 
lems about functional inner products and functional approximation. 


1. Compute the inverses of these matrices with orthogonal columns. Solve 
l 
Ax = | 2 
3 


where A is the matrix in part (b). 


6 8 a ae 
ef Pere Gult<o.) 06 2 


| 2 2 


2. Compute the inverses of these matrices with orthogonal columns. 


<i] 4 (ail Bees 6 

(a) 2 L ime (b) | —6 2 3 

l 2 3 3 6 2 
ae. l 
(ce) —.5 2 | 
l a 0 


Solve Ax = 1, where A is the matrix in part (a). 


3. Show that if A is an n-by-n upper triangular matrix with orthonormal 
columns, A is the identity matrix I. 


4. Compute the length k of the projection of b onto a and give the pro- 
jection vector Ka. 
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(a) a [(O, 1, Oj], b = [3, 2, 4] 

(b) a = [1, —1, 2], b = [2, 3, 1] 
(c) a = [3, 3, 3], b = [4, 1, 3] 

(d) a = (2, —1;3], b = [(-—2,5, 3] 


5. Express the vector [2, 1, 2] as a linear combination of the following 
orthogonal bases for three-dimensional space. 
tah Lt, = 8, 2h. be 2a th, 2 
(b) [3, 3, —3], (3, 3, 3], [3, —3, 3] 


(ce) (3, 1.5, othe es 2h Sk 4) 


6. Compute the pseudoinverse of 


ee 

3 3 

= eo 
A = ie 3 
a: 

3 3 


Find the least-squares solution to Ax = 1. 


7. Compute the pseudoinverse of 


3 + 
A= ]1 -2 
la 


Find the least-squares solution to Ax = I. 


8. Consider the regression model Z = gx + ry + s for the following data, 
where the x-value is a scaled score (to have average value of 0) of high 
school grades, the y-value is a scaled score of SAT scores, and the 
z-value is a score of college grades. 


Determine g, r, and s. Note that the x, y, and 1 vectors are mutually 
orthogonal. 


9. Verify that Theorem 2 is true in two dimensions, namely, that a change 
from the standard {e,, e,} basis to some other orthonormal basis 
{q,, G>} corresponds to a rotation (around the origin) and possibly a 
reflection. Note that since q,, q, have unit length, they are completely 
determined by knowing the (counterclockwise) angles 0,, 8, they make 
with the positive e, axis; also since q,, q, are orthogonal, |6, — 9,| = 
90°. 
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10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


(a) Show that an orthonormal change of basis preserves lengths (in 
euclidean norm). 
Hint: Verify that (Qv) - (Qv) = v~- v (where Q has orthonormal 
columns) by using the identity (Ab) - (Cd) = b’(A’C)d. 

(b) Show that an orthonormal change of basis preserves angles. 
Hint: Show that the cosine formula for the angle is unchanged by 
the method in part (a). 


Compute the angle between the following pairs of nonorthogonal vec- 
tors. Which are close to orthogonal? 

(a) [3, 2], (2; 4] (b) [1, Zs ats [2, =P 3] 

(c) {h, =3,2)0} 274, 3} 


Find the QR decomposition of the following matrices. 


es, 2 1 Os pe <a 
BN oat eid: Hoare RS 2eaeeed aay en 
73 ie ee 212 


Use the Gram—Schmidt orthogonalization to find an orthonormal basis 
that generates the same vector space as the following bases: 

(a) [1, 1], [2, =—E (b) [2, l, 2], [4, I, 1], 

te) Pots Toe Se Ti thy 3-2] 


(a) Compute the inverse of the matrix in Exercise 12, part (c) by first 
finding the QR decomposition of the matrix and then using (15) to 
get the inverse. (See Exercise 12 of Section 3.5 for instructions on 
computing R~'.) What is its condition number? 

(b) Check your answer by computing the inverse by the regular elimi- 
nation by pivoting method. 


(a) Find the pseudoinverse A* of the matrix A in Exercise 12, part 
(b) by using the QR decomposition of A and computing A* as 
AY =R-*©". 

(b) Check your answer by finding the pseudoinverse from the formula 
A* = (A’A)~'A’. Note that this is a very poorly conditioned 
matrix; compute the condition number of (A’A). 


Use (16) to find the pseudoinverse in solving the refinery problem in 
Example 3 of Section 5.3. 


Use (16) to find the pseudoinverse in the following regression problems 
using the model y = gx + r. 

(a) (x, y) points: (0, 1), (2, 1), (4, 4) 

(b) (x, y) points: (3, 2), (4, 5), (5, 5), (6, 5) 

(c) (x, y) points: (—2, 1), (0, 1), (2, 4) 


Use (16) to find the pseudoinverse in the least-squares polynomial-fitting 
problem in Example 5 of Section 5.3. 
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19. 


20. 


21. 


22. 


a 


27. 


Verify (16): A* = R~'Q’, by substituting QR for A (and R’Q’ for 
A’) in the pseudoinverse formula A* = (A’A)~'A’ and simplifying 
(remember that R is invertible; we assume that the columns of A are 
linearly independent). 


Show that if the columns of the m-by-n matrix A are linearly inde- 
pendent, the m-by-m matrix R of the QR decomposition must be in- 
vertible. 


Hint: Show main diagonal entries of R are nonzero and then see Ex- 
ercise 12 of Section 3.5 for instructions on computing inverse of R. 


Show that any set H of k orthonormal n-vectors can be extended to an 
orthonormal basis for n-dimensional space. 


Hint: Form an n-by-(k + n) matrix whose first k columns come from 
H and whose remaining n columns form the identity matrix; now apply 
the Gram—Schmidt orthogonalization to this matrix. 


Over the interval [0, 1], compute the following inner products: x - x, 
AE ME OK 


. Verify that the fourth Legendre polynomial is x? — $x. 


. Verify the values found for the weights w,, w5, w3, and w, in Exam- 


ple 8. 


Note: You must use integration by parts—or a table of integrals. 


. Approximate the following functions f(x) as a linear combination of the 


first four Legendre polynomials over the interval |—1, 1]: L(x) = 1, 
L.@).= 2. Din =< — 4 1.0).= 2% —3x/5. 

(a) f(x) =x* = (b) f(x) = |r 

(c) fx%) = -1x<0, = 1:x20 


. Approximate x° + 2x — | as a linear combination of the first four 


Legendre polynomials over the interval [—1, 1]: L)p(x) = 1, L,(x%) = 
x, L(x) = x? — 3, L(x) = x° — 3x/5. Your ‘‘approximation’’ should 
equal x° + 2x — 1, since this polynomial is a linear combination of 
the functions 1, x, x7, and x°, from which the Legendre polynomials 
were derived by orthogonalization. 


(a) Find the Legendre polynomial of degree 4. 
(b) Find the Legendre polynomial of degree 5. 


. (a) Using the interval [0, 1], instead of [— 1, 1], find three orthogonal 


polynomials of the form K,(x) = a, K,(x) = bx + c, and 
K(x) = dx* + ex + f. 

(b) Find a least-squares approximation of x* on the interval [0, 1] using 
your three polynomials in part (a). 
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(c) Find a least-squares approximation of x!/? on the interval [0, 1] 
using your three polynomials in part (a). 

(d) Find a least-squares approximation of x* — 2x + 1 on the interval 
[O, 1] using your three polynomials in part (a). Hopefully, your 
approximation will equal x — 2x + 1, since this polynomial is a 
linear combination of 1, x, and x*, the functions used to build your 
set of orthogonal polynomials. 


29. (a) Find a fourth polynomial K,(x) of order 3 orthogonal on [0, 1] to 
the three polynomials in Exercise 28, part (a). 
(b) Find a least-squares approximation to x* on the interval [0, 1] using 
your four orthogonal polynomials. 


30. Compute the inverse and find the condition number (in sum norm) of 
the following Hilbert-like matrices. 


2 Soe ee Tier ao. oo) Ske 

(a) |}s 9 ee es et. w- ne 
® eS: Mee ae eg 
9 10 (b) |s 6 7 cs) te 
ae ee ‘as See 

Bs Fer es pt eae © eae 


| Eigenvector Bases and the 
- Eigenvalue Decomposition 


In this section we use eigenvectors to gain insight into the structure of a 
matrix. We review how an eigenvector basis simplifies the computation of 
powers of A. Then we present a way to decompose a matrix into simple 
matrices formed by the eigenvectors. This decomposition yields a way to 
compute all eigenvectors and eigenvalues of a symmetric matrix. 

Recall that a vector u is an eigenvector of the n-by-n matrix A if for 
some A (an eigenvalue), Au = Au. In words, multiplying u by a matrix A 
has the same effect as multiplying u by the scalar \. It follows that A‘u = 
\“u. A stable distribution p of a Markov chain is an eigenvector of the 
transition matrix A (associated with eigenvalue 1: Ap = p). The dominant 
eigenvector (associated with the largest eigenvalue) gives the long-term dis- 
tribution of a growth model (see Sections 2.5 and 4.5). 

As noted in Section 2.5, if a vector x can be expressed as a linear 
combination of the eigenvectors u, of A; 


X = a@,U, i a5u, ltt a,u,, (1) 
then 


Ax = a,Au, + a,Au, + --: + a,Au, (2) 
a,\,u, + a,5U, + em a a,\,U,, 
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In words, matrix multiplication becomes scalar multiplication of eigenvector 
coordinates. 


Further, 


A*x — a, A‘u, + a,A*u, ene a, A*u,, (3) 
a,\ju, + a,dSu, + +--+ + a,A‘u, 


When we express x in terms of a basis of A’s eigenvectors, A*x can quickly 
be computed by (3). 

Assume that we index the eigenvalues so that A, > A, =A, 2=°°° 
= X,,, Aj is going to be much larger than the other \’s. Thus the first term 
on the right in (3) dominates the other terms, and we have 


A‘x = a,\iu, (4) 
We review the example from Section 2.5 that illustrated these results. 


TICS 
Example 1. Computing Powers of a Matrix 
with Eigenvectors 


The computer (C) and dog (D) growth model from Section 2.5 is 
ae x = [C, D| 
' = Ax, here A = : 
x x where ; x’ = (C.D) 


The two eigenvalues and associated eigenvectors of A are A, = 4 with 
u, = [1, 1] and A, = 1 with u, = [1, —2]. Note that since u, and 
u, are linearly independent, they form a basis for 2-space. 

Suppose that we want to determine the effects of this growth 
model over 20 periods with the starting vector x = [1, 7]. We want 
to express x as a linear combination of u, and u,: x = w,u, + wus. 
Determining the set of weights w = [w,, w,] requires solving 


xX = w,U, + WU, or x = Uw, 


l l 
where U = [u, u,] = ? a (5) 
The solution to (5) is w = U~'x, which yields w, = 3, w. = —2, 


so that x = 3u, — 2u,. (Here w,, w, are simply the coordinates of x 
in the eigenvector basis for 2-space.) Then 


Ax = A(3u, — 2u,) = 3Au, — 2Au, - 
3(4u,) — 2(1u,) (6) 
12u, _ 2u, 


For 20 periods, we have 
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A, = A2%(3u, — 2u,) = 3A”%u, — 2A7u, 
3(47°u,) — 2(1?°u,) (7) 


= 3-4>{1, t] — 20, —2] 


In Example 7 of Section 3.3 we observed that these steps were 
represented by a matrix equation 


Ax 


4 0 
UD, U~'x, where D, = 0 1 (8a) 


and 
A = UD,U™! (8b) 


where U, as in (5), has the eigenvectors as its columns. 

In words, we explain (8a) and (8b) as follows. The vector w in 
(5) can be viewed as the vector x expressed in terms of eigenvector 
coordinates, and as noted above, w = U'x. Looking at UD,U~! 
times x from right to left, the product U~'x converts x to the eigen- 
vector-coordinate vector w. Next the matrix D, multiplies each eigen- 
vector coordinate by the appropriate eigenvalue (D,w = [4w,, w,]) to 
get the eigenvector-based coordinates after the matrix multiplication. 
Finally, multiplying U converts back to the original coordinate system. 

The matrix-vector product x’ = Ax is transformed, in eigenvec- 
tor coordinates into w’ = D,w. Further, AX = UDSU~', so 


x2) = Avy becomes w) = Dw (9) 


For our particular A, 


20 20 
w2 = Dow = | , a = h 4 (10) 
O Iii wy, W> @ 


Summarizing (part of this is simply Theorem 5 of Section 3.3), we 


obtain 


Theorem 1. Let A be an n-by-n matrix and let U be an n-by-n matrix 


whose columns u, are n linearly independent eigenvectors of A. If 
¢ = Ab, then in u,-coordinates c*, b*, we have 


c#=D,b* or ct = X,dt (11) 


where b* = U~'b and c* = U'c are the u,-coordinate vectors for 
b and c and D, is the diagonal matrix whose diagonal entries are 
the eigenvalues of A, A,, Aj, ..., A,- Similarly, if c = A*b, then 
oe N 
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Theorem 2. For A, D,, and U as in Theorem 1, A can be written 


A 


UD,U~' (12a) 


and 

A‘ = UD‘U-' (12b) 
where D< has the eigenvalues raised to the kth power. Further, 

D, = U-'AU (13) 
Example 2. Conversion of A to D, 


ee 
Let us use the matrix A in Example 1, A = > f We had found 


l ] 
U to be: U = he a By the determinant formula for 2-by-2 
: ees ae 
inverses, U"' = |, , |- Phen, by (13), 

einen 


aww (Tal abs 
i | ) 


Let us next use (12b) to compute A‘. This formula is most useful for 
large k, but for illustrative purpose we use k = 2. First we compute. 


A? directly: 
i gn Ee ee eS 
(eS) i eae 10 6 


Next we compute A? as 


(14) 


Che colo Col colbo 
Cole 

a 

es 

—- 4 

Noe 

aE 


| 


l| 
= 
i 
me 


A? = UD2U-! = 


| 
ee en, 
aS nn ee 
— — po 
ON ON 
| No 
ee 
ee 
ef =_ 
pee QO 
wl colte 
—_— © 
——— 
i al 
et ct COS 
| 
cobs Cole 
aaa 
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The formula A = UD,U~' leads to an important way to decompose 
a matrix into simple matrices. Just as Gaussian elimination yields the LU 
decomposition and Gram—Schmidt orthogonalization yields the QR decom- 
position, so our eigenvector coordinates formula also yields a matrix decom- 
position. 

Recall that if ec = |[1, 2] and d = [3, 4, —1], the simple matrix 
c * d equals 


l 3 q =] 
exa=|)] 0 4 -y={2 9 4 


More generally, entry (i, j) in ¢ * d equals c,d,. 

Theorem 7 of Section 5.2 says that the matrix product CD can be 
decomposed into a sum of simple matrices formed by columns c© of C and 
the rows d* of D: 


CD = cS «d®¥ + cS ed +--+ + ch ak (15) 


Letting C = U and D = D,U™' (the ith row of D is the ith row of U~! 
multiplied by A,;), we obtain the following decomposition. 


Theorem 3. Eigenvalue Decomposition. Let A be an n-by-n matrix 
with n linearly independent eigenvectors u, associated with eigenvalues 
|A,| = |A.| = > + - = |A,|. Let uj denote the ith row of U~'. Then A 
is the weighted sum of simple matrices: 


A = UD,U'' = Ayu, * uy + AU, *#u6 +--+ +AU * ul (16) 


In (16) the A,’s are factored out in front of the simple matrices. 

For typical large matrices, the eigenvalues tend to decline in size 
quickly. For example, if n = 20, perhaps A, = 5, A, = 2, As = .6, 
A, = .02; so the sum of the first three simple matrices in (16) would yield 
_a very good approximation of the matrix. 


Example 3. Eigenvalue Decomposition 
of 2-by-2 Matrix 


We illustrate Theorem 3 with the 2-by-2 matrix of Examples 1 and 2. 
From those examples we have A, = 4, A, = 1 and 


oe l a 
= = oy 
; a r | ap : F 4 


Then (16) says 
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| | 
|i] 3] + | 3] +a —4] 
3 4 jg 
A 4 & | 
} 1S tae les q 
tah Ld! 2 7 


We next state a very useful fact about symmetric matrices. 


A 


Theorem 4 


(i) Any symmetric n-by-n matrix A has a set of n orthogonal eigen- 
vectors U,. 


(ii) Two eigenvectors associated with distinct eigenvalues are always 
orthogonal. 


Corollary. When A is symmetric, the matrix U in Theorem | has orthogonal 
columns, and U~'! is obtained from U’ by dividing each row by 
lu? = uu,‘ u,. When additionally the columns have length 1 (ortho- 
normal), the eigenvalue decomposition in Theorem 3 becomes 


A = \,u, *u, + AU *U, +--+: + AU, * 4, (17) 

The corollary’s claim about how to obtain U~' comes from Theorem 
1 of Section 5.4. 

The eigenvalue decomposition (17) sheds new light on what happens 
when we multiply a symmetric matrix A times some vector x. If in 
u,-coordinates (u, are orthonormal), x is 

xX = a,U, + au, + -°-> + au, (18) 


and we compute Ax using (17), we have 


Ax = (A,u, * u, + AWW, *u, +++: + AU, * U,)x (19) 
A,(u, * u,)x + A,(U, * u,)x CT A,,(U,, * u,, )X 


and (19) is equivalent to 
Ax = A,(u, * x)u, + A,(u,*x)u, +--+: + A,(u,° x)u, (20) 
since 
(u, * u,)x = u,(u, * x) or = (U, * x)U, (21) 


and similarly for the other u,. 
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We verify (21) as follows. The first u, in u, * u, is to be treated as 
an n-by-1 matrix, the second u, as a l-by-n matrix, and x as a column vector 
(an n-by-1 matrix). Now use the associative law of matrix multiplication 
in (21). 

So multiplying A by x, when A is represented as a sum of simple 
matrices, has the effect of first projecting x onto the eigenvectors u, 
[(u; * x)u, is this projection] and then multiplying each such projection by 
the eigenvalue \,. 

In Section 3.4 we used equation (4)—that A*x is approximately 
a multiple of u, to determine u, and ),. But we had no way to compute 
other eigenvalues or eigenvectors. The eigenvalue decomposition (17) gives 
uS a Way. : 

Suppose that A is a symmetric matrix and we have determined the 
dominant (largest) eigenvalue A, and an associated eigenvector u, (of length 
1) by an iterative method. Consider then the matrix 


A, —_ A — A,U, * u, (22) 


The matrix A, is A minus the first simple matrix in the eigenvalue decom- 
position in (17). It follows that 


A, = A.U, * uy + AU, *u, + °** + AU, * U, (23) 


Since the eigenvalue decomposition of a square matrix is unique, it 
follows that the (nonzero) eivenvalues of A, are A5, A3, ..., A, with 
associated eigenvectors u,, U;, .. . , U,. In particular, A, is now the dom- 
inant eigenvalue of A, and applying an iterative method to A, will yield A, 
and Up. 

If we subtract the second simple matrix in (17) from A,, we will get 
a matrix A, whose dominant eigenvalue is 3, and so on. This method of 
getting the eigenvalues and eigenvectors of A is called deflation. We note 
that if we have a small error in A, or u,, the resulting A, still has the same 
dominant eigenvalue and eigenvector (the consequences of such errors are 
explored in the Exercises). 


Deflation Method to Compute Eigenvalues 
and Eigenvectors of a Symmetric Matrix A 


Step 0. Set A, = A; seti = 1. 


Step 1. Use the iterative method to determine A,’s dominant eigenvalue 
\; and an associated eigenvector u, (of length 1). 


Step 2. Set A;,, = A; — A,u, * u,. Increase 7 by 1. If i = n, go to 
step |. 

A faster method for determining all eigenvalues and eigenvectors of 
any n-by-n matrix, symmetric or not, is presented in the appendix to this 
section. 
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Example 4. Finding Eigenvalues and Eigenvectors 
of a 3-by-3 Symmetric Matrix 


Let us use the deflation method to find all eigenvalues and eigenvectors 
of the symmetric matrix 


Setting x = [1, 0, 0] and computing A*°x, we get a multiple of the 
unit vector 


u, = [.684, .684, .254] with A, = 5.37 
Next we compute the deflated matrix A,: 


A, = A — \,U, *U, 


Se ae iwe got 99 
= hg Pie pee 2.5) «83 
ee ea) oS 0 ee 
49 —.5] .Q7 
— 4 Monee 49 07 


.O7 GE = 3S 
Again with x = [1, 0, 0], we compute A?>x and get [.5, —.5, 0]. So 
u, = [.707, —.707, 0] with A, = | 
(The reader should verify that [.5, —.5, 0] was an eigenvector of the 


original matrix A.) Deflating again, we have 
A, = A, — Aju, * u, 


Se S07 7 oe | 
sla 2 SA te S Ss 6 

i OF <3 a ea: 

=O =O 207 
a1 =o) —01 (24) 


.Q7 OF =.35 


Next we find that u, = [.181, .181, .967] and A, = — .37. Computing 
the last simple matrix, we obtain 
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=O), -~0i .O7 
AU * u, = = Ol 7 Ol .O7 
.Q7 OF —.33 


This simple matrix equals A;, as required. Thus we have confirmed 
that A is the sum of three simple matrices. a 


Substantial savings in computer storage can be realized by representing 
a large matrix as a sum of simple matrices. If a symmetric 20-by-20 matrix 
is well approximated as the sum of two simple matrices, then only 10% as 
much storage is needed: 2 columns instead of all 20 columns. 


Example 5. Approximating a Digital Picture 
with Simple Matrices 


Approximate the 8-by-8 digital ‘‘picture’’ A whose entries represent 
varying levels of darkness between 0 and 1. 


(25) 


Sno ohh oo = 
OR NWWNHKE OO 
ONWUAAWN OC 
mwa oOo RA WN 
mW WA © oR WW 
ONMUAAUN OS 
CORN WWNH HK OC 
COONNOCO 


We first approximate A with the simple matrix c * c, where c = 
[.1, .3, .7, .9, .9, .7, .3, .1] has c; equal to the (approximate) square 
root of the diagonal entry (i, i) of A. 


OF DS Di OD We “OF 303. <0! 
OS JO8° lA 2h s.27° 34° OF 2.Q3 
O07 .14 49 .63 .63 .49 14 .07 
a aii RO 27 .63 .B1..81 363° 2h 09 (26) 
OP «27.03 21 BY 65) 2 
07 .14 .49 .63 .63 .49 .14 07 
OS OF 24.27" 20" 544 OF 165 


01 03 07... © 07 103 01 


Now we shall use the eigenvalue decomposition of A into simple 
matrices to approximate A more accurately. We use the first three terms 
involving the three largest eigenvalues, 
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A = hu, * U, + AU, * U, + AQU, * UL (27) 


Actually, for this A, the first simple matrix A,u, * u, alone will be a 
good approximation. 

For large matrices, the iterative method of computing A*x for a 
large k to approximate wu, is unnecessarily time consuming. One of the 
faster methods mentioned in the appendix to this section should be 
used. For u,, we shall give the results using simple iteration because it 
converges quickly. Computing A'°x with x = [1, 1, 1, l, 1, 1, 1, UJ 
yields a multiple of 


u, 


[.078, .189, .408, .540, .540, .408, .189, .078] 
with A, 


2.774 


sO 


O17 .041 .088 .117 .117 .088 041 .O17 
041 .099 .214 .283 .283 .214 .099 .041 
088 .214 .461 .610 .610 .461 .214 .088 
117) .283) «6.610 =.808 .808 .610 .283 .117 
117.283) «.610) =.808 )=.808 §=.610 .283 .117 
088 .214 461 .610 .610 461 .214 .088 
041 .099 .214 .283 .283 .214 .099 041 
O17) .041 .088 117 .117 .088 -.041 .017 
(28) 


Au, * U, = 


This first simple matrix is very close to A. Except for entries (1, 3) 
and (1, 4) (and symmetrically equivalent entries), every entry in (28) 
is within about .04 of the corresponding entry in A. Since the numbers 
in A were probably rounded off to one decimal digit, one could argue 
that (28) is as good an approximation to A as we should seek. 

Next we compute the deflated matrix A,: 


A <— A,U, * U, 


= 017. —~ 241 — 088 .083 083 —.088 —.041 —.017 
— .041 .001 —.014 .Q17 017. —.014 001 —.041 
—.088 —.014 os — OW... = 01 039 —.014 —.088 

.083 Or —010. —.06. =—.s =.c10 O17 .083 

.083 017 —.010 -—.008 —.008 —.010 O17 .083 
—.088 —.014 O39 =—.01I0 —28 039 —.014 —.088 
— .041 001 —.014 O17 O17 —.014 O01 —.041 
= 017 ~.041 —.088 .083 O83: —086 =G41 — O17 


(29) 
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Using the iterative method on A, takes a long time to converge but 
finally gives us a multiple of 


u, = [.514, .224, .258, —.345, —.345, .258, .224, .514] 
with 4, = —.27 


Instead of computing the second simple matrix, we give the sum of 
the first two simple matrices, 


AU, * U, + AU, * U, 


—.054 .010 .052 .165 .165 .052 .010 -—.054 
O10 .084 .198 .304 .304 .198 .084 010 
O52 .198 .443 .634 .634 .443 .198 O52 
165 .304 .634 .776 .776 .634 = .304 165 
165 .304 .634 .776 .776 .634 = .304 165 
052 .198 .443 .634 .634 .443 .198 O52 
O10 .084 .198 .304 .304 .198 .084 .010 

—.054 .010 .052 .165 .165 .052 .010 —.054 

(30) 


Next we form A, and find that 


u, = [.451, —.054, —.454, .295, .295, —.454, —.054, .451] 
with A, = .263 


Note how close in absolute value A, and A, are; this is why the 
iterative method converged so slowly for A,. 
Now we can give the desired approximation of A by the first 


three simple matrices associated with the eigenvalue decomposition of 
A. 


A =~ \,u, * u, + AU, * Uy + AYU, * U, 


—~.001 .004 -—.002 .200 .200 —.002 .004 —.00I 
004 .089 .204 .300 .300 .204 .089  .004 
—.002 .204 .497 .599 .599 .497 .204 —.002 
200 .300 .599 .799 .799 599 .300  .200 
200 .300 .599 .799 .799 .599 .300  .200 
—.002 .204 .497 .599 .599  .497 .204 —.002 
004 .089 .204 .300 .300 .204 .089  .004 
~.001 .004 —.002 .200 .200 -—.002 .004 —.001 
(31) 


The average deviation of an entry of (31) from A is .002, and 
only one entry has an error exceeding .004. &B 
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We next give a Statistical application of the eigenvalue See 
of a paar matrix. 


eae’ 6. Principal Components in 
Statistical Analysis 


Suppose that an anthropologist has collected data on 25 physical char- 
acteristics, variables x,, x5, . . . , X25, for 100 prehuman fossil remains. 
The researcher computes a measure of the variability V; of each variable 
x; called the variance of x;. A large variance means that the x,-variable 
varies substantially from fossil to fossil. The anthropologist also com- 
putes a measure of the joint variability Cov,, of each pair of variables 
x;, x; called the covariance. The covariance is proportional to the 
correlation coefficient (which was discussed in Section 5.3). A positive 
Cov,;; means variables x; and x; have similar values; a negative Cov,, 
means the variables are opposites [if the kth fossil has a large x,-value, 
the kth fossil probably has a small or negative x,-value; and Cov,, near 
0 means values of x; and x; are unrelated (uncorrelated)). 

The anthropologist would like to find good linear combinations 
of the characteristics that ‘‘explain’’ the variability of the data. For 
example, one might define the /ength index L to be 


L= 24, + 4x, — 3X, + 4x47 + Sr (32) 


where x, might be length of forearm, x, length of thigh, and so on. 
The idea is that although we may find a certain amount of variability 
in individual variables from fossil to fossil, such as varying length of 
forearm, the ‘‘right’’ measure that gives the best way to distinguish 
one fossil from another is some composite index, such as the length 
index. 

Among all possible indices formed by a linear combination of 
variables, the index /, that shows the greatest variability (i.e., largest 
variance) is called the first principal component. Among those other 
indices that are uncorrelated to I, (covariance is 0), the index J, with 
the largest variance is called the second principal component, and so 
on. We want index /, uncorrelated so that it gives us new (additional) 
information about variability that was not contained in /,. 

In summary, the first principal component gives an index that 
explains the maximum variability of the data from one fossil to another. 
The first four principal components will typically account for over 90% 
of the variability in a set of 25 variables. Clearly, there are great 
advantages in describing cach fossil with three or four numbers rather 
than 25 numbers. The same is true for studies in psychology, finance, 
quality control, and any other field where people collect large amounts 
of data. 

So how do we find these principal components? That is, how do 
we determine the weights (coefficients) of the x;, such as the weights 
in (32)? The answer is that we form a covariance matrix C of all the 
covariances of the fossil data, where entry (i, j) is Cov,; (and Cov, = 
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V;). Then one can show that the vector of weights used in the first 
principal component is the unit-length dominant eigenvector u, of C. 
The first simple matrix \,(u, * u,) in the eigenvalue decomposition of 
C shows how much of the variability in C is explained by the first 
principal component. 

As an example, consider the following 4-by-4 covariance matrix 
(representing 4 of the 25 characteristics mentioned above) 


86 1.19 2.02 1.45 
1.19 1.68 2.86 2.06 
=S 2.02 2.86 5.05 3.50 (33) 


145° 208° 3.50 2:53 


All the covariances here happen to be positive (this is often the case), 
although they can be negative. We can determine the eigenvalues and 
associated eigenvectors by deflation, as in the previous examples, or 
by using some computer package. We find that 


A, = 10, A, = .098, A, = .022, A, = .003 (34) 
The dominant (unit-length) eigenvector is 
u, = [.289, .408, .707, .5] 


The simple matrix \,u, * u, should approximate C well, since A, is 
much larger than the other eigenvalues. 


84 1.18 2.04 1.44 
1.18 1.67 2.88 2.04 
2.04 2.88 5.00 3.54 
1.44 2.04 3.54 2.50 


Aju, * U, = (35) 


Upon comparing (35) with C, it is clear that /,; accounts for almost all 
the variability in the covariance matrix C. 

The first principal component index /, is the linear combination 
of variables x,, x5, X;, x, with weights given by u,: 


[, = .289x, + .408x, + .707x, + .5x, (36) & 


The eigenvalue decomposition of a matrix A and the deflation method 
for finding successive eigenvalues and eigenvectors depended on special 
properties of A. There had to be a set of n linearly independent eigenvectors 
for the eigenvalue decomposition and A had to be symmetric for the deflation 
method to work. Symmetry is easy to recognize. What about linearly inde- 
pendent eigenvectors? The following theorem answers this question. It is a 
companion to Theorem 4 (which stated that an n-by-n symmetric matrix has 
n orthogonal eigenvectors). 
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Theorem 5 
(i) Eigenvectors associated with different eigenvalues are linearly 
independent. 
(ii) If the n-by-n matrix A has n distinct eigenvalues, any set of n 
eigenvectors, each associated with a different eigenvalue, will 
form a basis for n-space, and the results in Theorems 1, 2, and 3 
apply. 


Proof. For explicitness, assume that n = 3 and let u,, u,, and u, be 
three eigenvectors of A associated with different eigenvalues A,, A>, 
and A,, respectively. It is easy to show that u, and u, are linearly 
independent (Exercise 14). Suppose that u,, u,, and u, are not linearly 
independent, so that u, can be expressed as a unique linear combination 
of u, and u,: u, = c,Uu, + c,u,. Now we compute Au, in two ways. 


Au, = A,U, = A,(c,u, + cou.) (37) 
and 

Au, = A(c,u, + cou.) = A,c\U, + AjcoU, (38) 
The representation of Au, as a linear combination of u, and u, is 
unique. That is, the weights of u, and u, on the right sides of (37) 
and (38) must be equal: A3c,; = A,c, and A,c, = A,c>. Thus A; = A, 


and A; = A,. This contradiction proves that u,, u,, and u, must be 
linearly independent. a 


The following 1s an example of a ‘‘defective’’ matrix to which Theo- 
rem 5 does not apply. 


= « 


Exa 
Eigenvector Basis 


0 | 
A= 
has the characteristic polynomial which equals det(A — AI) = X? 
(check this), so its two eigenvalues are both 0: A, = A, = 0. A is not 
symmetric and does not have two different eigenvalues. Thus Theorem 


5 does not apply to A. 
Any eigenvector u of A must satisfy (A — ODu = 0. That is, 


The matrix 


Au = 0 or Ou, + lu, = 0 (39) 
Ou, + Ou, = 0 


The first equation reduces to u, = 0, and the second equation is 
vacuous. Thus a solution u to (39) can have any value for u, while u, 
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must be 0. However, all such vectors u = [u,, 0] are multiples of one 
another, so the eigenvectors of A do not form a basis for 2-space. @ 


Another issue that we have skirted is complex-valued eigenvalues and 
eigenvectors. Complex eigenvalues were encountered in our discussion of 
population models in Section 4.5. In many other applications complex num- 
bers arise naturally. (Incidentally, one can show that the eigenvalues of a 
symmetric matrix are always real.) 


What happens when a matrix is not square (so that it has no eigenvalues 
or eigenvectors)? Can we find something like an eigenvalue decomposition 
for these matrices just as a pseudoinverse substitutes for the inverse in a 
nonsquare matrix? The answer is yes. 

In the spirit of the development of the pseudoinverse of a nonsquare 
matrix, we again turn to the n-by-n matrix A’A which is square and sym- 
metric [since entry (i, j) is just the scalar product of the ith and jth columns 
of A]. From the eigenvalue decomposition of A7A, one can obtain a decom- 
position for A. 


Theorem 6. Singular-Value Decomposition. For any m-by-n matrix A 
with linearly independent columns, let |A,| = |A,| = * - - = |A,| be the 
eigenvalues of A’A and U be an n-by-n matrix whose columns u, are 
the associated (orthonormal) eigenvectors of A’A. The u, form a basis 
for the row space of A. 

Define the ith singular value s, to be s, = Vd,. Let U’ be an 
m-by-n matrix with columns ui = (1/s;)Au,. The u) form an orthonormal 
basis for the range of A. 

Then A can be decomposed in the form 


A = U'’DU’ (40) 
where D, is the diagonal matrix whose ith diagonal entry is 5;. 


The proof of Theorem 6 is given in the Exercises. Recall from Theorem 
4 of Section 5.3 that if A has linearly independent columns, then A’A has 
linearly independent columns. There is a generalized form of Theorem 6 for 
linearly dependent columns. 

The definition of the columns of U' means that U’ has the matrix 
formula 


U’' = AUD! 


The factorization (40) leads to the following simple matrix decompo- 
sition (by the same argument that led up to Theorem 3). 


Corollary A. Let A be as in Theorem 6, let u, be the ith column of U, and 
let u be the ith column of U’. Then 
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A = 5,U, * U, _ S55 4 u, - 5 o > SU; * uy, (41) 
We also get from (40) a result like Theorem |. 


Corollary B. The computation c = Ab becomes c* = D,b* (or c¥ = 
s,b*) when b* = U’b and c* = Uc. That is, multiplying a vector 
by A reduces to scalar multiplication of the coordinates when we ex- 
press b in the proper row space coordinates and c in the proper column 
space (range) coordinates. 


The singular-value decomposition of A can be ‘‘inverted’’ to obtain 
the pseudoinverse of A. 


Corollary C. Let A be an m-by-n matrix. Then the pseudoinverse of A 
equals 


A*+ = UD>'U" (42) 
Recall that D-' is a diagonal matrix with entry (i, 7) = 1/s,. 


We now give the singular-value decomposition for the two-refinery 
matrix discussed in Section 5.3. 


£ ae EMTS 
Example 8. Singular-Value Decomposition of 
Two-Refinery Model 


The two-refinery variant was 


Pee > a 
HOS A pire 


20x, + 4x, —_ b, 


10x, + 14x, = b, or Ax = b (43) 
5x, + 3% = D, 
Let us compute the singular-value decomposition of A. First we form 
A‘A: 
525. 245 
ATA = 
ies $34 i) 


We need to find the eigenvalues A,, A, of A’A, since the singular 
values s,, Ss, are their square roots. Further, the eigenvectors of A7A 
are the columns of the matrix U. By some method, discussed previ- 
ously or in the appendix to this section, we find that 


\,~665 and u, = [.87, .49] (45) 
N~ 97 and uw, = [-.49, .87] 


So the singular values of A are s, = V665 = 25.6, s, = V97 = 
9.8, and 
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Figure 5.10 From Cliff Long, *‘Visualization of Matrix Singular Value Decom- 
position,’’ Mathematics Magazine, Vol. 56 (1983), pp. 161-167. 


87 —.49 
= 49 ~=87 (46) 


As an aside, we note that U performs a rotation of 20° (see Theorem 
3 of Section 5.4), 


Next we compute U' = [uj, u4], where u) = (1/s;)Au;. We 
obtain 


16 —.65 
14 (47) 
.26 19 


= 
[ 
2 
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Then the singular-value decomposition of A = U'D,U’ is 


ay ope ee ce ot 87. .49 
10 14) =|.00 4 calla a7) 4 
te 26 19 ; 


As a sum of simple matrices [see (41)], (48) becomes 


16.9 9.5 a =RS 
A =]13.4 7.5] + | -3.4 6.5 (49) 
30° “D3 =o bet 


The decomposition in (48) can be interpreted as follows. There 
are two basic “‘input’’ and “‘output’’ units for the refinery model. The 
first input unit consists of .87 barrel of petroleum for refinery 1 and 
.49 barrel for refinery 2, and one such input unit yields 25.6 output 
units, each consisting of .76 gallon of diesel oil, .60 heating oil, and 
.26 gasoline. The second input unit consists of — .49 barrel (production 
is ‘‘reversed’’) for refinery | and .87 barrel for refinery 2, and it yields 
9.8 output units, each consisting of — .65 diesel, .74 heating, and .19 
gasoline. mi 


A more impressive use.of the singular-value decomposition is given 
in Figure 5.10. Figure 5.10 (top) shows a 49-by-36 digitized image of a 
bust of Abe Lincoln [entry (i, j) is the height of the bust in that position]. 
The remaining figures in this set show the digitized image produced by the 
matrix A,, the sum of the first k simple matrices in the singular-value de- 
composition of A. 


Section 5.5 Exercises 


Summary of Exercises . 

Exercises 1—5 involve the diagonalization of matrices presented in Theorem 
2. Exercises 6-13 involve the eigenvalue decomposition of matrices into a 
sum of simple matrices (many require deflation to find the eigenvalues). 
Exercises 14 and 15 are about independence of eigenvectors. Exercise 16 
involves defective matrices. Exercises 17—22 involve computing the singu- 
lar-value decomposition. Exercises 23—25 prove the results in Theorem 6. 


1. Compute the representation UD,U~' of Theorem 2 for the following 
matrices whose eigenvalues and largest eigenvector you were asked to 
determine in Exercise 23 of Section 3.1. 


4 0 1 2 es ee Me Be 
(a) ; 4 (0) P 4 (c) é % (d) e ‘| 


2. For a starting vector of p = [10, 10], compute p"” = A'°p for each 
matrix A in Exercise 1. 
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x 


10. 


Compute A° for each matrix in Exercise 1 using the formula A> = 
UD;U~'. ' 


. (a) Given that A = UD,U~', prove that AX = UD2U~'. 


(b) Use induction to prove that AX = UD{U~'. 


. (a) Obtain a formula for A~' similar to A = UD,U~'. 


Hint: Only the matrix D, will be different. 


ae 
(b) Verify your formula in part (a) for A = > | (the matrix in 
Example 1). 


. Give the eigenvalue decomposition into a sum of two simple matrices 


(Theorem 3) for each matrix in Exercise 1. 


. Give the eigenvalue decomposition into a sum of two simple matrices 


for the following symmetric matrices. 


1 4 3 —4 =], =«§ 
i t 4 o eS s “ is: ry 


. Use the deflation method to compute all eigenvalues and eigenvectors 


of the following symmetric matrices. 


1 6 gag 01 
a E 4 ns i i 7 | , 


l 2 3 Rol Be 0 l 

(d) | 2 a TB. VO kf (f) = 3 EO =Z 
es 0 , 2% 0 | ) 1 

} 2 l 0 


. Using a software package for finding eigenvectors and eigenvalues, 


determine the eigenvalue decomposition into simple matrices for the 
following symmetric matrices. 


; #20 11 es a a 
om eee ae ieee a =i Say. Re 
01 0 sa ep eT ae & Oa 

ion & * 


Approximate the following symmetric digital pictures by: 

(i) The first simple matrix in the eigenvalue decomposition. 

(11) The sum of the first two simple matrices in the eigenvalue decom- 
position. 

Use deflation (or a software package) to determine the two first eigen- 

vectors and eigenvalues. 
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11. 


12. 


13. 


14. 


1S. 
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Explain how the symmetry in the 8-by-8 digital picture (25) in Example 
5 would allow one to find the eigenvalue decomposition for the 4-by-4 
upper right corner submatrix A’ and use this to get the eigenvalue 
decomposition for the whole matrix. 

Find the dominant eigenvalue and associated (normalized) eigen- 
vector for A’; approximate A’ by the first simple matrix in the eigen- 
value decomposition. 


Verify that the eigenvalues (34) and dominant eigenvector in Example 
6 are correct. Determine the second principal component in Example 6 
(the normalized eigenvector for \,). 


Determine the first principal component for each of the following co- 
variance matrices. In each case, tell how well it accounts for the vari- 
ability (how well does A,u, * u, approximate 


TEM ta 24 0.6 3.1 1.5 
(a) | 1.1 2.0 1.5 hy (O8 41 08 12 
0.5 1.5 4.2 71 OS 27 52 

PS. 12° 3.2 32 


In the proof Theorem 4, show that u, and u, are linearly independent 


by supposing the opposite, that u, = ru,, and obtaining a contradiction 
when Au, ~ A(ru,). 


Two n-by-n matrices A and B are called similar if there exists an 
invertible n-by-n matrix U such that A = UBU~'! (or equivalently, 
B = U“'AU). 
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16. 


17. 


18. 


19, 


20. 


21. 


22. 


23. 


(a) Show that similar matrices have the same set of eigenvalues. 
(b) Show that if A has a set of n linearly independent eigenvectors, 
then A is similar to a diagonal matrix. 


Find the eigenvalues and as many eigenvectors as you can for the fol- 
lowing defective matrices. 


ed Oa a ae 
So) aa | ee i) 10 2 1 
002 


Find the singular-value decomposition [equation (40)] for the following 
matrices and use the decomposition to write the matrices as a sum of 
simple matrices. 


2 i. 2 a 9 L* 2 
(a) | 1 (b) |} 3 3 (c) zt 39 (Gh ih2- 3 
Wy) L. 9 nas 3 4 

4 0 


For a refinery problem with p refineries and q products as in Example 
8, suppose that the coefficient matrix is the matrix in each part of 
Exercise 17. In each case, interpret the singular-value decomposition 
in terms of units of “‘input’’ and ‘‘output,’’ as was done at the end of 
Example 8. 


Use the formula for the pseudoinverse in Corollary C to compute the 
pseudoinverse of each matrix in Exercise 17. 


With the help of deflation or a software package to find eigenvectors 
of A’A, find the singular-value decomposition for the following matri- 
ces and use the decomposition to approximate these matrices as a sum 
of two simple matrices. 


2 QO =1 As Be xd 
1 a \ 4: C1542 
(a) 
4 =] 2 (bhi.s cb. 0 
5 0 -!1 4S. 62h ©Q 
Sha | ail 


If A is a symmetric matrix, show that the singular-value decomposition 
reduces to the standard eigenvalue decomposition in Theorem 2. 


Verify the formula for the pseudoinverse in Corollary C. 
Show that the m orthonormal eigenvectors of A’A in U (in the singu- 


lar-value decomposition) are a basis for the row space of the m-by-n 
matrix A. 
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Hint: What is the dimension of the row space of A if its columns are 
linearly independent? 


24. Show that the columns of U’ (in the singular-value decomposition) must 
form a basis for the range of A (the column space of A). 


Hint: Use Exercise 23, and Exercise 30 of Section 5.3. 
25. Verify the singular-value decomposition (40) by showing that 
D. = U’7AU (*) 
(a) First show that (40) and (*) are equivalent matrix equations. 
Hint: Use the fact that (by orthonormality) U7U = U’7U'’ = I. 


(b) Prove (*) by verifying the following sequence of matrix equations: 


U’'7AU = (AUD=")7AU 


(D-'U7A7)AU = D='U7(ATAU) 
D. 'U7(UD,) = D> 'D, = D, 


and Eigenvectors 


In this appendix we present two methods for finding eigenvalues and eigen- 
vectors. The first is a way to speed up the search for an eigenvalue \ and 
associated eigenvector u once we have a rough approximation to A, say, 
obtained by guesswork or by a few rounds of the iterative method A*x. The 
following basic theorems about eigenvalues are needed. 


Theorem I 

(i) For any nonzero integer k, A’s eigenvectors are eigenvectors of 
A*. If \ is an eigenvalue of A, then A‘ is an eigenvalue of A*. In 
particular, 1/X is an eigenvalue for A~'. (If k < 0, we assume 
that A~' exists.) 

(ii) For any scalar r, A and A — ri have the same set of eigenvectors 
and X is an eigenvalue for A if and only if X — r is an eigenvalue 
of A — rb. 


Proof. We give a proof of part (i) fork = —1 [positive k and Theorem 
1, part (ii) are left as exercises]. Suppose that u is an eigenvector of 
A with associated eigenvalue A. Then 


u = Iu = (A~'A)u = A~'(Au) (1) 


= Aw 


Dividing both sides of (1) by A, we have 
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(*) = A~‘y (2) 


So 1/d is an eigenvalue of A~' with eigenvector u, as claimed. & 


Theorem 2. The rate at which the iterative method A*x converges to a 
dominant eigenvector \, of A is proportional to |A,|/|A,]. 


Proof. Any vector x can be written as a linear combination of the 
eigenvectors (assuming that the n-by-n matrix A has n linearly inde- 
pendent eigenvectors) 


Xx =—c,U, + Cow F::* + CU, 
Then 
A‘x = c,A‘u, + c,A*u, + --- + c,A‘u, 
= c,A\fu, + oMu, t+: + ¢,N0 (3) 


A» \,, 
fn + (fu, +--+ oul 


The last line of (3) shows how the size (A,/A,) affects the convergence 
to Ac, uy. x 


Suppose that A has an inverse A~'. Since J is an eigenvalue of A if 
and only if 1/X is an eigenvalue of A~ ' and both have the same eigenvectors, 
we have 


Corollary. An eigenvector u,, associated with the smallest eigenvalue A,, of 
A can be found (if A, is unique) by applying the iterative method to 
find the largest eigenvalue of A7!: 


y” — AW ty&-) (4) 


The rate of convergence will be proportional to |A,,_ ,|/|A,,, where A,,_ ; 
is the second smallest eigenvalue of A. 


Rather than compute the inverse of A~' and then use (4), we can write 
(4) as 


Ay” = ye) (5) 


In (5), we are applying the regular iterative method in reverse. To find y“ 
given y*—", we solve (5) by Gaussian elimination (saving the matrices L 
and U—see Section 3.2—to use again for each successive y“). We call the 
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method for finding the smallest eigenvalue of A and an associated eigen- 
vector using (5) the inverse iterative method. 

Surprisingly, the inverse iterative method yields a faster way to com- 
pute the dominant eigenvector than the (forward) iterative method. 


Theorem 3. Shifted Inverse Iterative Method. If o is an approximate value 

for an eigenvalue of a square matrix A, an associated eigenvector u 
can be found quickly by applying the inverse iterative method to 
“A — ofl. The eigenvalue can then be obtained from u using the 
Raleigh quotient. If A, is the true value of the eigenvalue and A, is 
the. next-closest eigenvalue of A, the rate of convergence is 
IA, — ol/|A, — ol. 


This method ts called the shifted inverse iterative method because of 
the ‘“‘shift’’ of eigenvalues caused by —ol. Recall that from Theorem 1, 
part (ii), A — ol has the same eigenvectors as A and its. eigenvalues are 
shifted by o. If o is close to A,, then A,, — o will be the smallest eigenvalue 


of A — ol by far, and the rate of convergence IA, — ol/|A, — o| of the 
inverse iterative method will be very fast (i.e., two or three iterations should 
suffice. 


We should note that computations are very unstable with A — ol, 
since when X,, is close too, 1/(A, — 0) is an eigenvalue of A~' of immense 
size. This implies that lA- il, and hence the condition number of A, are 
very large. However, the only effect that this instability has on the com- 
putations of the inverse iterative method is a distortion of the total size of 
the y™ but not the direction of these vectors (the total size does‘not concern 
us, since we use scaling). 


Hybrid Deflation Procedure to Find All 
Eigenvalues and Eigenvectors of a 
Symmetric Matrix 


Step 1. Starting with A, = A, use the (forward) iterative method a 
few times on A, to get an approximation o to the dominant eigenvalue \, of 
A, together with an approximate eigenvector v. 


Step 2. Starting with v, use the shifted inverse iterative method on 
A, — ol to get more accurate values for the unit-length eigenvector u,. 
Obtain A, from u, by the Raleigh quotient u, - Au,/u, - u,. 


Step 3. Compute A;,, = A; — A,u;*u,, seti = i + 1, andifi=n, 
go to step |. 


We now present an almost magical procedure to find all the eigenvalues 
at once of a square matrix A with distinct eigenvalues. 
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QR Method for Finding All Eigenvalues of a 
Matrix with Distinct Eigenvalues 


Step 1. Let Ay = A. For successive k, 

(a) Given A,, compute the QR decomposition A, = Q,R, (by the 
Gram-—Schmidt orthogonalization procedure); and then 

(b) Set A, , = R,Q, and go to step (a). 

Stop when the entries below the main diagonal of A, are all almost 0 
(entries above the main diagonal do not converge to 0 unless A is symme- 
tric). 


Step 2. The entries on the main diagonal of the last A, will be ap- 
proximately the eigenvalues of A (in order of decreasing absolute value). 
Use the shifted inverse iterative method to find the eigenvector [or if 
the eigenvalue A is essentially exact, solve the homogeneous system 
(A — ADu = OF. 


Example hi Example of QR and Shifted Inverse 
Iterative Method 


Let us use the QR method on the matrix L in the Leslie model from 
Example | of Section 4.5. 


* 


O49 
x" = Lx, whereL = 1.4 0 O 
GO 6." 6 


We noted in Section 4.5 that L‘x takes a long time to converge to a 
multiple of the dominant eigenvalue A,. Since convergence is slow, 
Theorem 2 says that the second largest eigenvalue A, must be close to 
A,. The QR method is also slow for this L. If we run it until all below- 
diagonal entries are < .001 (then the eigenvalues are accurate to about 
three decimal places), it requires 60 iterations and yields 


L334 —-3586 —1.142 


An. =| .000 -1.118 —.394 (6) 
.000 000 —.152 
So A, = 1.334, A, = —1.118, A, = —.152. Solving the homoge- 


neous system (L — A,I) = 0 will give a corresponding eigenvector. 

Let us next try the shifted iterative method. Starting with x = 
[100, 50, 30], we gave a table in Section 4.5 of iterates up to L7°a 
(Table 4.3), at which point there was still a little cyclic behavior. 
Consider the ninth, tenth, and eleventh iterates: 


x = [2021, 493, 279], x = [2250, 808, 295], 
x“')) = [3529, 900, 485] 
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When we compute the Raleigh quotients for k = 9 and k = 10, we 
get 


xO) <x) 5,027,899 
eee ee SS eee ee, Phe 
to x 4,414,515 (7) 


x1)- xD 8 668,230 


Se SS eee = 1.495 

x10). x19) § 802,389 

The average of our two estimates is (1.138 + 1.493)/2 = 1.32. Pre- 
sumably, we are alternating above and below the true eigenvalue, so 
this average value of 1.32 should be close to the true eigenvalue. Now 
we apply the shift step of computing L — 1.32I: 


oO 1.32. 0 

eb 132i =) 4 0 OF — Pp) 0 1.352. 60 
0 6 O 0  WESZ 
=) oe + l 


I 
i 
| 
ee 
S) 
oS 


Let us scale x“ by dividing its entries by the largest entry, 3529, to 
obtain x’ = [1, .255, .137]. We use x’ as the starting vector for 
backward iteration with L’ (in search of an eigenvector associated with 
the smallest eigenvalue of L’). After two rounds we have the vector 
(rounded to integers) [4658, 1396, 628], which divided by the sum of 
its entries (to be like a population probability distribution) yields 


u, = [.697, .209, .094) (8) 
Computing Lu,, we obtain 
Lu, = [.930, .278, .125] = 1.334u, (9) 


so A, = 1.334. Further, (9) confirms that u, is an eigenvector. 

If we wanted to get all eigenvalues and associated eigenvectors 
for L, we could use 10 or 15 rounds of the GR method to get estimates 
for each eigenvalue and then use the shifted inverse method to home 
in the associated eigenvector of each eigenvalue (any vector can be 
used as the starting vector for the inverse iterative method). a 


The proof of convergence for the QR method is beyond the scope of 
this book (see G. W. Stewart, Jntroduction to Matrix Computations, Aca- 
demic Press, 1973, and the classic textbook by J. H. Wilkinson, The Al- 
gebraic Eigenvalue Problem, Oxford University Press, 1965, for a fuller 
discussion of the OR method). 

The convergence is not fast, especially when two eigenvalues are close 
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together (in absolute value). The computation of each stage is slow (propro- 
tional to n*), but schemes are available to ‘‘preprocess’’ A so that the suc- 
cessive QR decompositions can be computed quickly. Other shortcuts further 
speed this procedure, including shifts (as in the shifted inverse iterative 
method). This method works in some cases where there are multiple eigen- 
values (of the same absolute value). 


A related method called the LU method reverses the matrices in the 


LU decomposition of a matrix the way the QR method reverses the matrices 
in a QR decomposition (see the books mentioned above). The LU method 
also converges for any matrix with distinct eigenvalues. 


Section 5.5 Appendix Exercises 


Summary of Exercises 


Exercises 1—5 relate to Theorem 1. Exercises 6—8 illustrate the hybrid de- 
flation procedure. 


1, 


(a) Show that for any positive integer k, any eigenvector for the square 
matrix A is also an eigenvector for A*. 


(b) Also show that if A~' exists, part (a) is true for negative integers. 


(a) Show that if A~' exists, any eigenvector for A“ is also an eigen- 
vector for A. 
(b) The existence of A~' is essential for the result in part (a). Verify 


0 | 
this by showing that for A = 0 ; , 1 is an eigenvector for A’ 
but not for A. 


(a) Show that for any positive integer, if A is an eigenvalue of the 
square matrix A, then A‘ is an eigenvalue of A‘*. 
(b) Also show that if A~' exists, part (a) is true for negative integers. 


. Show that for any scalar r, A and A — rI have the same set of eigen- 


vectors. 


. Show that for any scalar r, A is an eigenvalue for A if and only if 


A — ris an eigenvalue for A — rl. 


. Use the inverse power method to find the dominant eigenvalue and 


eigenvector for the following matrices. 


4 0 Re 7 01 
” b 4 ©) ; ‘ © | ‘4 ~ ik 4 


. Use the hybrid deflation procedure to find all eigenvalues and associated 


eigenvectors for the following symmetric matrices. 


1 6 a: <8 01 
= b | ” & 1 © | 
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I 2 3 ie | coe a 0 l 

(d) | 2 cob fe) Ph a2 (f) =4 3 y <2 
ey 0 at oR 0 l ) l 
 o=2 l 0 


8. Apply the QR method (using the program in Exercise 10 or a software 
package) to find the eigenvalues and associated eigenvectors for the 
matrices in Exercise 7. 


Programming Projects 
9. Write a program to implement the hybrid deflation procedure. 


10. Write a program to implement the QR method. 


; | o 2. | . ‘4° ' . ; nat , 
7 , | ; | : ; * | | ; as ah ) ’ a" | ‘ ; ad _ = \ 5.4 te, 2, i 
4 i a ; Ms hae! J “es a). “A pe 4 - ‘ . y ee es sh 


am 
is 


. ¥ " ‘ - L 7 7 
)»* , 9 rT 7 6 - 
: . a. . 1G, y | p ~T | 
@ . ri ? [ : 7 ; ' i. | 
; bes > ’ ; ; a Ay 4 P e . , P 4 7 j _¢ Ks ‘A: 
; o / : ; D> vs A i ; : it ~ 
Me as | | | Al ee ee ee 


i a ee ; 


es 


Bi . 

Py ve n 
: 5 

“ Pe i + . 


+ eee of, 
ud Sale ds, 


=— = 2S <a 
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A Brief History 
of Matrices and 
Linear Algebra 


Matrices and linear algebra did not grow out of the study of coefficients of 
systems of linear equations, as one might guess. Arrays of coefficients led 
mathematicians to develop determinants, not matrices. Leibniz, coinventor 
of calculus, used determinants in 1693, about one hundred and fifty years 
before the study of matrices in their own right. Cramer presented his deter- 
minant-based formula for solving systems of linear equations in 1750, and 
Gauss developed Gaussian elimination around 1820. These events occurred 
before matrix notation even existed. As an aside, we note that Gaussian 
elimination was for years considered part of the development of geodesy, 
not mathematics; the Gauss—Jordan method, which we called elimination by 
pivoting, first appeared in a handbook on geodesy. 

For matrix algebra to develop, one needed two things: (i) the proper 
notation, such as a, and A; and (ii) the definition of matrix multiplication. 
It is interesting that both of these critical factors occurred at about the same 
time, around 1850, and in the same country, England. Except for Newton’s 
invention of calculus, the major mathematical advances in the seventeenth, 
eighteenth, and early nineteenth centuries were all made by continental math- 
ematicians, names such as Bernoulli, Cauchy, Euler, Gauss, and Laplace. 
But in the mid-nineteenth century, English mathematicians pioneered the 
study of the underlying structure of various algebraic systems. For example, 
Augustus DeMorgan and George Boole developed the algebra of sets (Boo- 
lean algebra) in which symbols were used for propositions and abstract 
elements. 

The introduction of matrix notation and the invention of the word 
‘*matrix’’ were motivated by attempts to develop the night algebraic language 
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for studying determinants. In 1848, J. J. Sylvester introduced the term ‘‘ma- 
trix,’’ the Latin word for “‘womb,’’ as a name for an array of numbers. He 
used ‘‘womb’’ because he viewed a matrix as a generator of determinants. 
That is, every subset of k rows and k columns in a matrix generated a 
determinant (associated with the submatrix formed by those rows and col- 
umns). 

In search of good notation for working with determinants, Sylvester 
in 1851 proposed writing a square matrix in the form 


Q;Q,; GA, *** G,Q,, 
Q2Q, G,A, *** GQ, (1) 
a,,Q, a,A5 al ln a,,Q,, 


with each entry represented by a product of symbols. He also introduced the 
shorthand notation for a square matrix of 


d a . e.@ a 
Q; Q> . . . Q,, 
He referred to the a’s and a’s as umbrae, or ideal elements. Using this 


umbral notation, Sylvester then wrote the determinant of (2), which involves 
summing the signed products of all permutations of the a’s with the a’s, as 


& Go &@, ; 
As Ye See DF 
Soon after the introduction of (1), the two symbols a and a were merged 
into one with double subscripts—a;, (Cauchy had actually used a;; in 1812, 
but the notation was not accepted then). 


Matrix algebra grew out of work by Arthur Cayley in 1855 on linear 
transformations. Given transformations, 


T;: x’ = ax + by T>: x" = ax’ + By’ 
' = cx + dy y" = yx’ + By’ 


he considered the transformation obtained by performing 7, and then per- 
forming 7). 


T,T: x" = (aa + by)x + (aB + Ddd)y 
y’ = (ca + dy)x + (cB + dd)y 


In studying ways to represent this composite transformation, he was led to 
define matrix multiplication: The matrix of coefficients for the composite 
transformation 7,7, 1s the product of the matrix for 7, times the matrix for 
T,. Cayley went on to study the algebra of these compositions-—matrix 
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algebra—including matrix inverses. The use of a single symbol A to rep- 
resent the matrix of a transformation was essential notation of this new 
algebra. A link between matrix algebra and determinants was quickly estab- 
lished with the result: det(AB) = det(A) det(B). But Cayley believed that 
matrix algebra would grow to overshadow the theory of determinants. He 
wrote, ‘“There would be many things to say about this theory of matrices 
which should, it seems to me, precede the theory of determinants.’’ 

It is a curious sidelight to this discussion that another prominent English 
mathematician of this time was Charles Babbage, who built the first modern 
calculating machine. Abstracting the mechanics of computation as well as 
its algebraic structure and notation seems to have been all part of the. same 
general intellectual development in mathematics at that time. 

Mathematicians also tried to develop an algebra of vectors, but there 
was no natural definition for the product of two vectors. The first vector 
algebra, involving a noncommutative vector product, was proposed by Her- 
mann Grassmann in 1844. Later, Grassmann introduced what we called 
simple matrices, formed by a column vector times a row vector. 

Matrices remained closely associated with linear transformations and, 
from the theoretical viewpoint, were by 1900 just a finite-dimensional sub- 
case of an emerging general theory of linear transformations. Matrices were 
also viewed as a powerful notation, but after an initial spurt of interest, were 
little studied in their own right. More attention was paid to vectors, which 
are basic mathematical elements of physics as well as many areas of math- 
ematics. The modern definition of a vector space was introduced by Peano 
in 1888. Abstract vector spaces, whose elements were functions or even 
linear transformations, soon followed. 

Interest in matrices, with emphasis on their numerical analysis, re- 
emerged after World War II with the development of modern digital com- 
puters. Von Neumann and Goldstein in 1947 introduced condition numbers 
in analyzing roundoff error. Alan Turing, the other giant (with von Neu- 
mann) in the development of stored-program computers, gave the LU de- 
composition of a matrix in 1948. The usefulness of the QR decomposition 
was realized a decade later. 


References 


Bell, E. T., The Development of Mathematics. McGraw-Hill, New York, 1940. 
F, Cajori, A History of Mathematical Notations, Vol. 2. Open Court Publishing 
Company, Chicago, 1929. 


- os 
7 era» 


“9 


a 


© zs 
eh de) 


js 
o 
a i. b 


i 
vy. 


’ * FI , ! > 
‘ 7 be 
Som, aN ete 


Abe 
" D 
t ‘\ 


A, ~ oe . 
A; he " MS é - +55 7 e. @ 
f a ‘ — Aas 
o ,  M(aA 4 ois ae. yates he Mass A ee) 


Text and Software 
References 


Reference Texts 


Introductory Linear Algebra 


Anton, H., Elementary Linear Algebra, 3rd ed. Wiley, New York, 1981. 


Campbell, S., Linear Algebra with Applications. Appleton-Century-Crofts, New 
York, 1971. 


Gewirtz, A., H. Sitomer, and A. W. Tucker, Constructive Linear Algebra. Prentice- 
Hall, Englewood Cliffs, N.J., 1974. 


Grossman, S., Elementary Linear Algebra, 2nd ed. Wadsworth, Belmont, Calif., 
1984. 


Kolman, B., Elementary Linear Algebra, 4th ed. Macmillan, New York, 1986. 
Kumpel, P., and J. Thorpe, Linear Algebra. W. B. Saunders, Philadelphia, 1983. 


Nicholson, W. K., Elementary Linear Algebra with Applications. Prindle, Weber 
& Schmidt, Boston, 1986. 


Strang, G., Linear Algebra and Its Applications, 2nd ed. Academic Press, New 
York, 1980. 


Williams, G., Linear Algebra with Applications. Allyn and Bacon, Boston, 1984. 


Freshman-Level Linear Algebra 


Althoen, S., and R. Bumcrot, Matrix Methods in Finite Mathematics. W. W. 
Norton, New York, 1976. 


Brown, J., and D. Sherbert, /ntroductory Linear Algebra with Applications. Prindle, 
Weber & Schmidt, Boston, 1984. 


S11 


512 


Text and Software References 


Applied Linear Algebra 


Helzer, G., Applied Linear Algebra. Little, Brown, Boston, 1983. 
Magid, A., Applied Matrix Models, Wiley, New York, 1985. 


Noble, B., and J. Daniels, Applied Linear Algebra. Prentice-Hall, Englewood Cliffs, 
N.J., 1977. 


Rorres, C., and H. Anton, Applications of Linear Algebra, 2nd ed. Wiley, New 
York, 1979. 


Numerical Analysis 


Cheney, W., and J. Kincaid, Numerical Methods and Computing. Brooks/Cole, 
Monterey, Calif., 1980. 


Conte, S., and C. deBoor, Elementary Numerical Analysis. McGraw-Hill, New 


York, 1978. 
Hildebrand, F., /ntroduction to Numerical Analysis. McGraw-Hill, New York, 
1974. 


James, M., G. Smith, and J. Wolford, Applied Numerical Methods for Digit Com- 
putation. Harper & Row, New York, 1985. 


More Advanced 

Golub, G., and C. VanLoan, Matrix Computations. Oxford University Press, New 
York, 1983. 

Stewart, G., Introduction to Matrix Computations. Academic Press, New York, 
1973. 

Wilkinson, J., The Algebraic Eigenvalue Problem. Oxford University Press, New 
York, 1965. 


Specific Applications 


Graphics 

Berger, M., Computer Graphics. Benjamin-Cummins, Menlo Park, Calif., 1986. 

Magnenat-Thalman, N., and D. Thalman, Computer Animation: Theory and Prac- 
tice. Springer-Verlag, New York, 1985. 

Preparata, F., and M. Shamos, Computational Geometry, An Introduction. Springer- 
Verlag, New York, 1985. 

Rogers, D., and J. Adams, Mathematical Elements for Computer Graphics. 
McGraw-Hill, New York, 1976. 


Linear Models in Statistics 

Dunn, O., and V. Clark, Applied Statistics: Analysis of Variance and Regression. 
Wiley, New York, 1974. 

Graybill, F., An /ntroduction to Linear Statistical Models. McGraw-Hill, New York, 
1961. 

Mendelhall, W, /ntroduction to Linear Models and the Design and Analysis of 
Experiments. Duxbury Press, Belmont, Calif., 1968. 


Differential Equations and Other Physical Science Applications 

Braun, M., Differential Equations and Their Applications: An Introduction to 
Applied Mathematics. Springer-Verlag, New York, 1975. 

Noble, B., Applications of Undergraduate Mathematics in Engineering. Mathemat- 
ical Association of America, Washington, D.C., 1967. 


Text and Software References 513 


Spiegel, M., Applied Differential Equations, 3rd ed. Prentice-Hall, Englewood 
Cliffs, N.J., 1981. | 


Strang, G., Introduction to Applied Mathematics. Wellesley-Cambridge Press, 
Wellesley, Mass., 1986. 


Markov Chains 
Hoel, P., S. Port, and C. Stone, /ntroduction to Stochastic Processes. Houghton 
Mifflin, Boston, 1972. 


Kemeny, J., and L. Snell, Finite Markov Chains. D. Van Nostrand, New York, 
1960. 


Growth Models and Recurrence Relations 

Goldberg, S., Jntroduction to Difference Equations. Wiley, New York, 1958. 

Kemeny, J., and L. Snell, Mathematical Models in the Social Sciences. The MIT 
Press, Cambridge, Mass., 1969. 


Linear Programming 

Bradley, H., A Hax, and T. Magnanti, Applied Mathematical Programming. 
Addison-Wesley, Reading, Mass., 1977. 

Gass, S., Linear Programming, 4th ed. McGraw-Hill, New York, 1975. 

Hillier, F., and G. Lieberman, /ntroduction to Operations Research, 4th ed. Holden- 
Day, Oakland, Calif., 1986. 


General Applications 


Berman, A., and R. Plemmons, Nonnegative Matrices in the Mathematical Sciences. 
Academic Press, New York, 1979. 


Gantmacher, F., Applications of the Theory of Matrices. Wiley-Interscience, New 
York, 1963. ' 


For more references, see A Basic Library List, published by the Mathematical 
Association of America, Washington, D.C., 1988. 


Matrix Algebra Software 


General-Purpose Computer Languages and Computation 
Packages with Basic Matrix Operations 


APL. Reference: See Heltzer’s book under ‘‘Applied Linear Algebra.”’ 

MACSYMA, Symbolics, Inc. 

MAPLE, University of Waterloo. 

MINITAB. Reference: Ryan, T., et al., Minitab Student Handbook. Duxbury Press, 
Belmont, Calif., 1976. 

muMATH (for IBM PC, Apple), The Soft Warehouse (Microsoft). 

TRUE BASIC and other versions of the language BASIC that have matrix operations 
built-in; for example, MATRIX 100, an enhanced BASIC for IBM PCs from 
Stanford Business Software. 


Matrix Computation Packages 


GAUSS (IBM PC), Applied Technical Systems. 
Linear Algebra Computer Companion (Apple), Allyn and Bacon, Boston. 
LIN*KIT (for IBM PC and Apple), Wiley, New York. 


514 


Text and Software References 


MAC (MatrixAlgebraCalculator) (IBM PC, Rainbow), Professor E. Herman, Math- 
ematics Department, Grinnell College, Grinnell, [owa. 

Matrix Calculator (Apple), CONDUIT. 

MATRIX (IBM PC, Apple, Macintosh), Decision Science Software. 

PC-MATLAB (for IBM PC), The Math Works, Portola Valley, Calif. 

The following two packages, designed for larger computers, are the best matrix 
computation software in existence. PC-MATLAB and MAC use parts of these 
packages. 

EISPACK. Public domain. Reference: Smith, B., et al., Matrix Eigensystems 
Routines—EISPACK Guide, 2nd ed. Springer-Verlag, New York, 1976. 
LINPACK. Public domain. Reference: Dongarra J., J. Bunch, C. Moler, and 

G. Stewart, LINPACK User's Guide. SIAM, Philadelphia, 1979. 


Chapter 1 


Solutions to 


Odd-Numbered Exercises 


Section 1.1] 


1. 84 feet 3. VH,/4. §. A = 1500, B = 500. 7. W= 3,h = 6. 
9. C = 7,R = 2. 11. M = 22,000, S = 14,000. 13. A = 12, B = 44. 
15. Slower ferry = 13.5, faster = 18.5. 17. lim = —% (k= 1). 


k—!1 
19, k = § W = 12. 21. Answer not reasonable: IQ,, = 240, IQ, = 0. 
23. P = 5, J = —3 (cannot be negative). 
Section 1.2 


1. Heating oil and diesel oil both off by 120. 3. x, = 4, xX, = 30, x, = 40. 
5. (b) x2 must be negative. 7. (b) x, = 45, x, = 25, x, = 35. 

9. Energy off by 10. 11. (a) x, = 306, x, = 212, x, = 160, x, = 37; 

(b) x, = 332, x, = 263, x3 = 163, x, = 43; (ce) x, = 378, x. = 287, 

x3; =~ 175, x, = 46. 13. (b) x, = 2.3, x» = 28, x, = 63. 

Section 1.3 


3. (a) [0, .25, .5, .25, 0, 0]; (b) [.125, .375, .375, .125, 0, 0]; 
(c) (+5, #5, os, fs. 7, 0] (dd) [.1, .2, .2, .2, .2, 1]. 5. (a) 
(b) % (c) is 


Col Colpo 
hh poh 
—————_— 
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516 Solutions to Odd-Numbered Exercises 


A ees eon TD 0 
Wists a seo FO 
7. (a) slo 25 5 of (b) [.375, .5, .125, O}, [.31, .47, .19, .03]. 
ya) US | a -  | 
1. 8 2 & 2 0 
Deo ee “OP aT 
ice, (ae Ow ee d = (0, .4, .3, 3,0, 0,0] 
next round = [V, .4, .3, .3, 0, O, O], 
i | A a a second round = [.16, .24, .33, .18, .09, 0, OJ. 
RAG OO) Ss See eS 
SAB RE EARS, OS: Sar 
OD i Os Bee 


11. (a) (3, 3]; @) (, 4); (© and @) [.1, .2, .2, .2, .2, .1); 

(e) [.86, .14]; (®) [.6, .4]; (g) [.4, .2, .4]; (h) [.01, .01, .01, .96]; 

(3). 1.35.52, .2; 28 GG) £83, ~0, ~0;.-0, ~0, ~0,..17] 

(k) {.. 70, ~-0,.~0, ~0, ~0, ~0, 30}; @ £535, ~0, ~0,.~0, ~0, ~0, .4 71; 
(m) [ABC 0, AB 0, AC 0, BC 0, A .27, B .19, C .44, none .10]. 

13. (a) [45, 50], [39, 50], [32, 49]; (b) [—20, 33]. 

15. (a) [30, 110], [3, 126], [—34, 151]; (b) [S0, 130], [47, 172], [39, 231]. 
17. (a) (29.4, 22.8], (28.9, 21.8], [28.5, 21.0], converges to [27, 18]; 

(b) (8.4, 3.7], [8.7, 4.3], [8.9, 4.8], converges to [9.75, 6.5]; 

(c) converges to [4.5, 3]; (d) converges [7.5, 5]. 19. (a) Line F = §R; 
(b) [10, 15] converges to[—15, —10]; (c) [1, 2] converges to [—3, —2]; 
(d) convergence in one period. 21. After 3000 days, you are closer to start- 
ing point than before. 


Section 1.4 


1. x, = 15, x, = 65. 3. (18.4, 3). 5. Mathematics/Science. 

7. C = 0, W = 160, objective function = 6400. 

9. (a) Minimize 50x, + 40x, subject to x,, x, = 0, 20x, + 50x, = 500, 

30x, + 100x, = 1000, 10x, + x, = 200, 15x, + 2x, = 50; (c) minimum of 
1144 atx, = 19.6, x, = 4.1. 11. Max (i) = Min (ii) = 18. 


Section 1.5 


1. (a) 23; (b) 8; (ce) 1; (d) 25; (e) 8. 3. (a) YX; (b) KU; (ce) ZA. 
5. (a) x =7 (mod 26); (b) x = 13 (mod 26); (c) no solution; 

(d) x = 15 (mod 26). te (ay 25 24. 24, ST, 2h... S 

(b) 24, 26, 26, 27, 28, . ... ; (ey 30; 24, 28, 26, .... 

9. (a) 2,5, 4, 5, 4, 7, 5, 9, 5, 10, 7, 11, 12, 11, 16, 14, 18, 15, 19, weak 


smoothing; (b) 3, 3, 4, 5, 5, 6, 6, 7, 8,9, 8, 10, 11, 12, 14, 16, 17, 17, 18, 
good smoothing; (c) 5, 4, 4, 4, 6, 4, 8, 6, 9, 6, 11, 10, 12, 14, 13, 16, 10, 18, 
15, fair smoothing. 1. (a) dj = (d,_4 + 2d,;_, + 3d, + 2d,,, + dj,4)/9; 
(b) d; = (dj_4 + dj_3 + 2d;_. + 2d;_, + 3d; + 2di4, + 2dj4. + dj,3 + 


d;.4)/15. 15. (a) 6: (b) K; (c) U. 
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Chapter 2 
Section 2.1 
2 3 
l. (a) {1 2 3 4); (14h: © Il6i @ 4 (e) 3. 
5 7 
ATk 4 0 5 
3. (a) H1 3 3 O}; (b) 0;]41]; (©) rows | and 2, columns | and 2. 
S{4 4 1 5 
5. (a) a, = gallons of diesel oil from | barrel by refinery 3, 
a ¢ 38 a Al 
(b) |} 10 14 10]; (ce) | 10 = 4}. 
» 3S 24 : 
ry 
q 4 | 7) 
20 10 


[1,0, 0] 


11. Operations not commutative at entry where row and column intersect, value 
at entry (1, 2) depends on which interchange first. 
13. (a) 2) + 41, (b) J — A; (c) 3J — 2A + 41. 


5a: 76 &:3 17. 5 INPUT R: INPUT S 
76 60 88 10 FOR I = | TOM: FORJ = 1 TON 
15. : 15 C{l, J] = R*A[I, J] + S*BIIL J] 
ie tee ee 20 NEXT J: NEXT I: END 
§.4 5.4 6:2 
Section 2.2 
1. (a) 2; (b) 5; (ce) 38; (d) 14. 3. (a) aA = [14, 25, 36, 47]; 
0 
(b) bB = [5, —7, 2]; (c) not defined; (d) not defined; (e) Bb = | —8 |; 
4 
38 
18 
f —— 
(f) Ce 74 
29 


3 .190 40 F514 
. ; A $3.70, 69. 
5. (a) j me 1S Oe | , (b) store A $3.70, store B = 69 
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2 1 —2 
7. (a) A = 2 a ed hel ee (b) A= } 1 0 xf 
~ =5 3 
as =] 0 
0 2 -!1 0 
b= 1/3], Ax =b; (c) A= ]3 2 QO}, x = Ax. 
5 4 -3 0 
be 4 
2 
a) el We a 
-p=prt 11. dAp = 13. -—31= 
9. p p + Ap p = 2210 3 avy 8 ; 9 
0 4 3 —9 
0 0 4 4 


5 19 -4 F Af ! 6 10 
17. (a) Pe 37 | (b) & rb (c) not possible; (d) & 4 


e) [25 94 -20 
* 165 303. ~45 |" 
14 re 
19. (a) [1, -1, 0) (b) ] 28]; @ | _ J]. 24. (BANC), = —40. 
35 Ke 
160 155 
23. (a) | 182 169]; (b) [70 235 95}; (©) [1835 1765); 
95 100 CTA C™(AB) 
AB 
156.5 
(d) [2.7, 5.3, 3.7]; (e) | 172.9 |. 
BD | 98.5 
ABD 


25. (a) [280 120 100]; (b) A7D; (e) entry (1, 1) in B’(AC). 


$ 64 OF i oat . 
27. (a) F i).4 (b) e ea (c) i Hf: approaching (3, 4]. 
So wf Tang ij 
29. (a)/}0 2 Of; (b) (AB); = a,b;. 31. (d)]0 1 O}; 
iPr “G33 > 2. 0 
4 Ss 
(e) | 0 l Q |. 
0 Q l 
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Section 2.3 


1. (a) 5 o———ed ; (b) . ; ec) : b 


b d 
e 


mm 


324312 1010-0 

eee 1313 2 02010 

1, go gk M223 B 2k lt o' 20 4h 

ag eee 13132 01020 

eg ae ee a G66 4 

522200 201020 

s Fe 9 i 03.0202 

ae oe ‘en ee es ee a 
47) — I- 

RE NE gs gg A ee gs a 

ie She a 7020 3-0 

bre 2 Ve a 020102 


5. Every entry in A(G) or A*(G) is positive for G, and G,. 
7. D, points vector [5, 3, 2, 0], D, points vector [9, 5, 4, 2, 0]. 11. (a) 0; 


(b) 1; (ce) O. 13. Parity will remain even if two bits are changed. 
0 0 0 

15. (a) e = | 0], fourth bit; (b) | I |, sixth bit; (c) | 1 |, second bit. 
l l 0 


17. New Q obtained from old Q by interchanging rows 3 and 4; ¢ = 
[l, I, Q, i, Ie 0, 0] 


ee a a ee ee ee 

met 1 ee 22 OC oe Oy £4 
eS he roy f a ONO. OM a Eee ak 

0 2-0.6°6°0.0. F Pat 2 4 Hop a 
Section 2.4 


lL@h: ha @l @Me 1; Ol Ws ha; W@W O. 

3. (a) Qx = x; (b) (Q — Dx = 0; 5. (a) Ax = By + c¢; 

(b) Ax — By = cc; (c) x = (I — A) + By = c. 7. (a) p’ = p + Ap; 
(b) p’ = (1 + A)p; (c) p® = (1 + A)™*p. 

9, A(x? + x*) = Ax? + Ax* = b + 0 = b. 


a oat oe | —} Ae ae 

il i eg 8 |: (b) ee a ee! 

im a EP 4° OP 
i a - We 7 
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ibe 2 JOD 
ee Ne ae | 
(O).\0 8. .f “EZ 10048. 
ok ale 0 
Do! Os ] 


13. (IA)I = 1-1 = 5. 15. (a) 1A? = 1, 
(b) 1A? = (AADA = (DA = 1; (c) 1A¢ = (149A = (DA = 1. 


5 
| 
19. y, =3 y l 
l 
l 
5 


CD — — WH — WN 


I 3 ey 


21. Entry (i, j) of M(G)M(G)’ tells how many (0 or 1) entries rows i and j have 
in common, that is, if an edge joinsitoj. 23. (A), = a? - af = af-af = 
(A?);;. 27. Let C have at least two columns. 33. (a) (AB)? = ABAB: 
(b) let A and B be diagonal matrices. 


Section 2.5 


3. (a) |a — b|, = V46, |a — bj, = 10, ja — bl,,, = 6; (b) sum norm equals 
sum of (absolute) differences of the entries ina and b; (c) max norm equals 
largest (absolute) difference between an entry in a and the corresponding entry in b. 
5. (a) (a) and (b) x* = e,, (c) and (d) x* = e,; (b) (a) x* = [I], 1], 

(b) x* = [-—1, 1], (c) x* = fl, 1, 1], (d) x* = [-1, 1, 1). 7. (a) 6; 

(b) 24; (c) [4, 4, 4]; (d) 6-6 = 1296. 

9. (a) (i) |All, = 1.2, ||Allax = 1.25, (ii) ||Al], = 1.3, |lAllax = 

(b) (i) |p’|, = 38 =< |All pl, = 48, |p'lax = 19 < lAllnalPlox = 

(ii) |p'|, = 32 = |All, Pi, = $2, [p'lax = 18 = AllxiPlax = tt 


(c) (i) 69, (ii) 88. . 375. 13. (a) 4,8 + (b) wo, # 
15. (a) Assuming that ‘a is largest absolute entry in a, then 
lal nx = la,| = Vaz = V aj 5 as ee oe lal. 


17. Assuming that |b,| is largest absolute entry in b, then 
ja- bl = [2, a,b) = , (ja\|b|) = &; lallb,| = (&; lad)|b,| = lal, + [Dhnx: 


19. A symmetric means sum of ith row equals sum of ith column. 
21. (a) ||A + Bil, = largest (absolute) column sum in 


A + B = max jaf + by], = max |af|, + max |bfl, = |/All, + [Bll 
i i i 


25. (a) Av = be ; of both entries in Av increase as a and b increase, so 


= b = | maximizes |Av),,,; (ce) If first row of A has larger absolute sum and, 
Say, @,> is negative, then let v = [1, —1]. 
29. (a) 2*[1, 1] + 2[1, 0] = [18, 16]; (b) 27[1, 1]; (c) 9u, — 3u,, 
[285, 288]. 31. A?u = A(Au) = A(Au) = A(Au) = A(Au) = D2u. 


Solutions te Odd-Numbered Exercises §21 


Section 2.6 
1. (a) 1000; (b) 10°; (c) 2000: (d) 500; (e) 2000; (f) 9000. 
Za 6 CUC«iRR 
| SE Se 
3. (a) If Ris a2 X 4 matrix of 1’s, M = R RI 
R 2R 
; @ i a 
at |e Mae ER L=8 
me De sh M = [on eat 
ot: sb 


5. Additional last row of D’is[(0 0 0 O 1). 


at: Air ae Mr Sy cae et 
ajo 44300 0 i 
bis 0 0 0 0.0 0 
ote OF) 2°06 6.8 6 
NZ 
na] 4400000) =|" Nl 
£10.56 6:60 & £4 
fie 3 0 0 §¥.0 0 4 
gio 0.0 0 30 04 
hlz 000 § 4 £ 0 
ve ae ae 
R 2J 42 43 | , 
—_— ~ “ : 
9. M Pe 2 |. were aes id and J is 4-by-4 matrix of 1’s. 
2 4.4 
B O O BO. O 
11. (a) |O 2B O}; (b)|/O 4B’ O,}: 
O O 3B O OO. 9B 


(c) 45 mults. versus 729 mults.; (d) [2 0 2 4 8 4 18 Q 18]. 


7 
13. (a) t ; (b) es ot 
| B 


15. (a) If A= [A, A, A,], B = | B,], then AB = 
B, 


[A,B, + A,B, . A,B]; 


nr ae 
yo ¢ € & 
Welt $ & «COU 
Ow $$ 4 § 


etc. 
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Chapter 3 


Solutions to Odd-Numbered Exercises 


19. (a) Bandwidth is 2, .5 on main diagonal, .25 just off main diagonal, columns 
1 and 14 same as original frog Markov chain; (b) bandwidth is 3, .. . , .0625, 


25, .375, .25, .0625, .. . . 21. Bandwidth is 2k — 1. 
R Q 
23 A = 
(a) bes 
50 5.0.4 °9 
05 0 & O48 
A’ A” x .8°S ks 0 
Y ‘= 
ms Fr Ar, were A 04050 8)’ 
$1.0 & 6 iSO 
O63 Oo 4 OS 
; 2 82-0) 4 
ae | ae RY aes 
va O23 t 2 603'] 
Ose Ok ee 
i 4 6 3.0 2 
AO 2 0.26 


(c) call submatrix Q: for n = even, Q” = I; n = odd, Q” = Q. 


Section 3.1 


1. (a) -17; (b) 0; (@) -2. 3. x, = (DW —- G)/I5, 
x, = (L0G — SD)/15a. 5. (a) Unique x = y = 0; (b) (x, y) = r(1, 4); 
jay ="G,)t. .@-2% oc © @s. © -4 


(f) 3.3.x 10-8. 9. x, = 334, x, = 67}. 11. Det = —.002. 
13. x = [(2e)d — (2b)f)/[(2a)d — (2b)c], similarly for y. 
15. (a) Det = a5,4). — ao2a)). 19. (a) 24; (b) 0; (ce) 24. 


21. Area(ABC) = area(ABB'A’) + area (BCC'B') — area(ACC'A'), where 
area(ABB'A') = 3(x, — x,)(y> + y,), similarly for BCC'B', ACC'A’. 

23. (i) A = 4, 2, u = [], 1]; (it) A = 5.3, —.37, u = [.46, 1]; (iit) A = 4, 1, 
u = [1, 2]; (iv) A = 3, 3, u = [1], 1]; (Vv) A = 2,1, -—1,u = [1], 1, O}. 

25. A = 1.05 + .09j (imaginary), u,; = [—0.58 + i, .67i). 

27. (a) X = 5, 2, u = [1, 2], v = [1, —1]}; (b) p = 10/3u + 5/3v; 

(c) (10/3)5*{1, 2]. 33. det (A — AD = (a — A)(b — A). 35. (a) 3; 
(b) 7; (ce) 3. 


Section 3.2 


l(@x=%y=3 x=, y = -% (x =2,y = 6. 
3. (a) x = [30, 14, —9]; (b) x = [3, —4, 0]; (©) x = [—1, 2, 3]; 
(d) x = (#5)[37, 15, 9]; (e) not unique; (f) x = (7p)[—21, —76, 164]. 
: © Ol2 =3 2 l O OWf-1 -!1 ] 
5. (a) rT On $ Ol; (b) 1-1 1 O 0 -3 4 
“1 7 ifto @ «5 =2 0 i -0 © =? 
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1 0 O|f-1 -3 2 LO O24 as 

Ola? ¥ OF 0). 9 Ol 4 v “Oe =a—3 
-5 ¥ 1] 0 Oo -1 -} 1{|/0 oO -? 
Tee te a: | Ll 0 .0]f2 -—3 =1 

12 2 OO Sh —S ap ]e 4 Olle -<¥" =3 
SF PMO BD : -395 1]f0.- 0 11 


7. (a) x = [2, 0, 3]; (b) x = [8, —*4°, -#]; © x = [-*, 2, #; 
(d) x = (s5)[—55, 60, —80]; (e) no solution; 
(f) x = (7D[-—40, —205, 425]. 


) oe B 
Lf. & @ 
: = 13, 335, —-16, 11, L = é 
9. (a) x = (3)[13 1] ie 
B aaa 
I 3 2 -! 
0 -2 -!1 2 
poet Pe er ee 
0 0 QO -3 
| 0 0 0 
. = 1s si eG 
(b) x = [i, 1, Bi 2|,L = : = | 0 ’ 
4 1 -—3 l 
3 2 ] 0 
an a! 
wad | eat ie oe 
0 Q 0 2 
Ee & 9 
(c) x = (4)[19, —2, —29, 18], L=]2 1 00 
(rows 3 and 4 switched > ar? 4.07 
to avoid zero pivot) > . 3 Bb 4 
l | —] -] 
QO =-2 2 3 
ee le) 36. ea. SOR 
0 0 0 -¢% 


11. About n°/2. 13. (a) Multiple solutions; (b) no solution; (c) x = 
(47.8, —54.3, 160.9]; (d) x = [*9°,74°, 0]. 15. (a) x = [10, —2, 0); 
(b) no solution; (c) x = [*¥, 6, 3,4]; (d) x = [3, 0, 0, 11. 

17. Jello = 5.04, fish = 4.89, meat = 1.53. 19. 40 micros, 200 terminals, 
20 word processors. 
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0] 15 -3 0 0| -1 

0 450]; (b)}-5 10 a]. 

1| 50 Eo psy 
a b 

23. (a) L = ,U= | 

~ oe ie | 5 i Kae 


l 0 O ax 6x" e* 
25. L=/x/3 1 O1,U= —x?  Exe* |, det = 21x3¢". 
2 
l 


2/x 9/x 


Section 3.3 
a. (ax, +2, =1 —] 
X, + 3x, = 0; (b) | —3]. 
xX, + & =0 


5. (a) & aI (©) (ji) [0, 1, Gi) [8, 41, Gii) [-2, 3}-nonsense; 


(d) (i) Nonsense, (ii) [§, 4], (iii) nonsense. 


—-§ *# -4} —j ¢ f 
7. (a) | -1 2 O};; (b) | -3 -§% —-#4]; 
$s -¢ 4 -1 -3} 0 


— fF w% 
(e) no solution; (f) | —#9 1 7 |. 
" ~- -4 


9. (a) [¥, Zz =]; (b) [é, — §, —$], (c) [s, e. P =F), 

(d) sal 14, — 10, 38], (e) no solution, (f) (a. it, #1; (b) (a) (5, 0, =i. 

(b) [—4, %, 0], (c) [#, —34, —42], (@ 4I40, — 12, 16], (e) no solution, 
(f)[—t, —t, ma]; (©) (a) [¥, 2, —8], &) -£&. 3, H, (© 4137, —23, —16), 
(d) s3[34, — 16, 2], (e) no solution, (f) 5, 48, — 38). 


5 4-329 e¢ £44 
* 4.3. .9°3 fae Mes 
M..fa) 3° 3 3 20VE. Olt be hb AL. 
2 PS «Qe J 4 € 1 ¢ 2 
oe Cher i we 


13. No inverse. 


0001 —.127 038 
15. (a) | —.0004 382 —.013 |, [5.04, 4.89, 1.53]; (b) .51 less Jello; 
005 —.089 001 


(c) .013k more. 
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1 1 

oo 1 z 
17. (a) | 4 0 —%], [100,000, 50,000, 100,000); 

-$ 2 $ 

(b) change = [— 100,000, 0, 200,000]; (c) change = [—S000k, —SQ00kK, 
15 ,000k]. 
21. (BA)C = (A~'A)C = IC = C or B(AC) = B(AA™') = BI = B. 
25. Entry (i, i) in inverse is 1/a,;. 27. (a) A~' is same as A except the a in 


entry (i, 7) becomes —a; (b) part (a) is true wherever a is. 
29. (a) Has inverse, (b) no inverse; (c) cannot say, (d) no inverse; 
(e) cannot say (A may not be square). 


1 olf4 olf 1 0 
a k ‘lp 4 Ue i: 
4 1|[54  olf.s2  .76 
vag * ap 4G wal 


1 It}4 O}F8 = 4) | | 
(c) f a it “ i (d) not possible, only one eigenvector. 


35. (a) If A = UD,U—', then A? = AA = (UD,U~')(UD,U “') = 
UD,(U~'U)D,U~' = UD,D,U-' = UD{iU"'. 

37. If 0 is an eigenvalue of A, then Ax = 0 has multiple solutions. So Theorem 
4, part (i) is false, and A has no inverse. 


Section 3.4 
1. (a) A = 2, [1,1]; (b) A =3,[-1, 1]; (©) A= 4,01, 1]; (dd) iteration 
does not converge. 3. (a),(b) Cycles 45° around unit circle, iteration does not 


converge; (ce) \ = (V2/2)(1 + i). — 5S. (a) A = 1.33, [.70, .21, .09); 
(b) first and second largest eigenvalues are close together. 

7. (a) [178, 150, 150}; (b) [202, 148, 135]. 9. Sum of powers fails 
because ||D|] > 1. 11. (b) (i) [4.11], 1.15, .04], (ii) [—3.10, 11.03, .52). 


QO -$ 3 
13. (a) D=|-% O #]; () x = [-.60, 4.25, .54]. 
is —io 0 


15. (a) (i) Does not apply, (ii) applies for (30) and (31), (iii) does not apply, 
(b) (i) 2x, + x> = 4, apply (30); (ii) $x; — x, = 5, apply (30). 

3x, — 4%, =2 fx; + x = 3 
17. (a) Convergence much faster. 


Section 3.5 


1. (a) 8°; (b) 8°/3; (c) 8. 3. k = 10. oS. bee + 5S, Xe + 4, %6 + 3; 
Xe + 3,%, + 2,%]. 7. Os, ts, te, te... -. as, de]. = 9. (a) No new 
nonzero entries are created during elimination; (b) All the upper right side of 
the matrix becomes nonzero. 11. w? multiplications per pivot. 

13. (a) x= 0,y = -1; (Hx =0,y = 5; () x =0,y = —.333. 

15. (a) Any [x, y, z] with y = z + 1 is asolution; (b) x = 1, y = .333, 
z= .00011. 17. @).¢ = [=—.393, 0], 3" = &-= [.338, —T}; 

(b) ¢ = [1.75, 0], x* — e€ = [-1.75, .5}; (Cc) © = [-—.333, 0], 
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x* — ¢ = [.333, —.333]. 19. (a) 21; (b) 5; (ce) not invertible; 

(d) 4606. 21. (a) |el,/|x + el, = 1.4; 

(b) x' = [12, 28.4, 66.5], lel,/Ix + el, = .13. 23. (a) 525%; (b) 386%: 
(c) (i) jel = ||A~ ‘|| le’|, Gi) [Al] - [x] = [b| fe bs = |bj/||Al|), then dividing (i) by 
(ii) yields |e|/|x| = ||Al] - ||[A~'l] - |e’|/|bl. 5. c(A) = %, 94% error. 

27. 1 = |, = |AA~'I], = |/AILIA "I, = A). 


Chapter 4 
Section 4.1 
1. (a) x = =x, y = -y; () X = 2e + 7, y = y + 33 ©) xX =x +F y, 
y= yy, @xo= =47' =p 33. @&) 0,9, (—1; © 0, -—D. (1, —); 


(b) (7, 3), (9, 3), (7, 4), (9, 4); (©) (0, 0), (1, 0), (1, 1), (2, 1); (d) (O, 0), 
(— 1, 0), (0, I), (—1, 1). 5. Lower left corner of grid and x-by-y size of 
each square given (a) (—3, —2), 1-by-1; vf (10, 1), 2-by-1; (c) (—1, —2), 


l-by-1 trapezoid; (d) (—3, —2), 1-by-l. 7. (a) x’ = —y, y’ = 2x; 

(b) x’ = —x, y’ = -y; ii Save ty = —y — 3. 9. x’ = x, 
y =4-y. ll. x’ = -y + 4,y' =x — 2. 13. (a) x' = —2y, 

y’ = —2x; (b) x’ =x, y’ = y. 17. (b) x = —x + y, 9 = —2x + 2y. 


19. (a) x’ = .866x — .Sz, y' = y, z’ = .Sx + .8662; 


(b) x’ = cos 10°x — sin 10°y, y’ = sin 10°x + cos 10°y, z’ = 2; 
(c) x’ = .75x — Sy — .433z, y’ = .433x + .866y — .25z, z’ = .Sx + .8662z. 
21. x. = x + .216y + .375z, y’ = .99ly — .284z. 


23; (a) x‘ = x, y’ = .866y — .5z + .134, z’ = .Sy + .866z — .5. 

25. (a) &- 251, 1; -@) 2: 21k () -—¥- 2h, 1. 

27. 4-471, 1] + [l, —2], 4-471, 1). 29. (a) x =x’ -—y,y=y’; 
(b) x = .707x' + .707y, y = —.707x + .707y; (c) not invertible. 


31. T([a, b]) = T(ae, + be,) > aT(e,) + bT(e,) = .\ 4) + 


Section 4.2 
1. (a) ¥ = 1.82x, SSE = 43.2; (b) y = x + 3.57, SSE = 15.7; 
(c) y = x’ + 6.57. 3. (a) X = —.86y + 12.43; (b) SSE different for 


x- and y-values. 5. (a) y = .17x — 6.56; (b) y = .Sx — 28.2; 
(c) y’ = .5x’ + 3.3. 7. g' = .089, y = 11/x. 


Section 4.3 


1. (a) 2N;H, + N,O,— 3N, + 4H,O; 

(b) 2C,H, + 150, — 12CO, + 6H,O. 

3. 1SPbN, + 44CrMn,O, — 22Cr,0, + 88MnO, + 5Pb,0, + 90NO. 
‘R422, =4,4=6 k= 82.6 = 1.) = 3% = i. 
9. y(t) = 2y(t), —s AL. (a) y(t) = 2), x") = —4y() + Sxl); 

(b) y(t) = x(t), x(t) = —6y(t) — S5x(1); 

(c) y(t) = x(t), x(t) = 2(t), z'() = —2y(t) + 3x(t) + 42(t); 

(d) y(t) = x(t), x'() = 2(t), 2’) = yt) + 2x(0). 
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15. (a) x(t) = y(t) = 10e*: 
(b) x(t) = 10/3{2e" + e}, y(t) = 10/3{4e" — e4: (©) x(t) = y(t) = 10e%. 


Section 4.4 


1. Cycles through [1, 0, 0], [0, 0, 1], [0, 1, 0], [3, 4, 4] is stable distribution. 

3. (8, 4). 5. [#, @). 7. p* = [48, as, 33], columns of A! = p*. 

9. (b) Not regular. St. Co) T/C — Jr. 2. 222... 5 231k: 

) Waele a 2”... 9 4 1 Wher, a + Set 
(c) qq /f2"-3, 2°74, 2"-8, 2"-8, 4,1, 1,4, . . 28-8, 20-8, an-4 gn-3), 
(a) 1/(2n — 2)(1,2,2,2,...,2, 1). 

13. If A begins, 


l-a b 
a l-b-c 
0 Cc 
0 0 
A — lis then 
—a b 
a -b-c 
0 c 
0 0 


—a b 
Q -c 
0) c 
i) 0 


For second-column elimination, add second row to third row (all other middle col- 
umns like this). 
15. 3.3,14.2. 17.32,5. 19. (a) % (&) #® © #. 


$s 2 € 1 $$), IN = [¥, 12, ¥, 12, ¥, 
ae Se 4 

WN 1s Ss F 3 8 a ae 
SS ss ae Se? mete a 
2 2s 

25. 48, 8. 


Section 4.5 


1. (a) 52%, [.48, .31, 21]; (b) 0%, [.57, .29, .14]; (©) 17%, [.62, .27, .11); 
(d) 13%, [.58, .26, .11, .05]; (e) — 16%, cyclic. 


3. X = 1, —.5 + .866i (for all A, A] = 1). 
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5. Reduced matrix U = 


oo oc co = 


7. For herd of constant size, we want a small number of females (we only harvest 
50), but then there are not enough females to give birth to as many males as are 
needed to be able to harvest 100 males annually. 

9. (a) [742, 742, 405, 405, 408, 408]; 


— .091 2.21 0 4.05 0 7.42 
0 1.30 0 4.05 0 7.42 
(b) —.5 1.20 —.9I1 2.21 0 4.05 3 
0 71 0 1.30 0 4.05 
— .085 2.06) —1.56 3.48, =2.86 6.94 
0 1.21 0 2.23 0 4.08 
11. a, = 0.8a,_). 13. a, = a,-, + @,-> Ag = 34. 15. (a) ao) = 2; 


(b) ay) = 5° 4: (©) dy = 4-4 (d) ay = 1. 


rw 2[if welt] alt} oee[t] ecw t| 
l 
wif] 


21. If a, = na", then a, , = (n — l)a"~' anda,_, = (n — 2)a"~?; substitut- 
ing ina, — 2aa,_, + a*a,_, = 0, we have 


na" — 2a - (n — 1)a"! + a? - (n — 2)a"-2 = = fn — in) — 1) +0 — Die’ = 


Section 4.6 


1. Min 40x, + 55x5, 50x, + 100x, = 500, 100x, + 100x, = 800, 500x, + 
700x, = 8000, x, = 0, x, = 0. 3. (a) Max 40x, + 30x,, x, + x, S 400, 
5x, + 3x, = 500, [Sx, + 20x, = 4000, x, =0,x,=0; (b) x, = 0, 
= 166%, income = $5000. 5. x, = 0, x, = 1663, cost = $4166.67. 
7. Min x, + 2X; + 3X43 + 2X, + 3x%q + 2ko3 + 2x5, + 4X5. + 3x53, x, + 
Xipg + X13 = 1500, xo, + X29 + X, = 2000, x5, + x3. + x33 = 2500, x,, + 
Xp, + Xz, = 1000, x1, + X22 + X32 = 2000, x13 + X43 + X33 = 3000, x, = 0. 
9. Min 2; 2, a,x, [where a; is entry (i, j) in hours matrix and a; = ‘‘—’’ means 
that term is not in the sum], x,,; + xX). + X;5 = 1, X29 + X93 + Xo + M5 = I, 
Xz, + Xap + Xz = 1, Xgy H+ Xgn + Xgq H+ Mags = 1, X53 + Xsg + X55 = 1, 2%, + 
X3, + Xa, = 1, Xj2 + X22 + X90 + Xqp = 1, Xo3 + Xyzy + X53 = 1, Hyg + Xqg + 
X54 = 1, X45 + X25 + Xq5 + X55 = 1, x, = Dor l. 
11. X,; = money invested in A at start of ith year, same for x,;, u; = money not 
mer in ith year: Max us + 1.4x,44 + 1.7xg, + 2x¢ + 1.3Xp, X4; + Xp, + 
= 10,000, x4. + Xg. + Xe + Uy = Uy, X43 + Xpz + Uy = Uy + 1.4X,4), 


sa + u, = uy + 14X49 + 1.7Xg), X45 + Xp + Us = Ug + 1.4x43 + 1.7X p>, 


all variables = 0. 
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15. (a) x, = 0, x, = 0, x, = 6, 24; (b) x, = 7, x, = 0, x, = 1, 23; 

(ce) x, = § x = § x, = 0, ¥s @) x = 3,55 = 0% = 0; G. 

17. x, = 0, x, = 10, x, = 30, x, = 0, x5 = 20, 1530. 

19. Delete last equation and [by subtracting equations (5), (6), and (7)] change 
first equation to Xj4 — Xy; — X%2 — X23 — X3, — X32 — X33 — Xqy — XQ — 

X43 = —2. Inequalities are x5; + Xo. + X25 + Xz, + X32 + X33 + Xqy + XQ + 
Xgy & 2, Fy HX H+ 5 SB bey + Xo F553 S let t+ hee t+ te SI, 

Xn, + gy + Xqy S 1, Xyq + X32 + Xqq S 1, Xy3 + Xy3 $+ Xy3 S 1, x = O. 

21. (a) 60x,, + 30x, + 2900; 

(b) x,, = 0, x, = 10% = x% = 20, x4, = 15, x, = 0). 

23. If 1 acre less planted: $20 less income, | acre more of corn, 2 acres less of 
wheat; if $1 less used: $2 less income, .1 acre less of corn, .3 acre less of wheat. 
25. If 1 unit less of metal, $500 less profit, { more cars, 3 less trucks; if 1 unit 
less of labor, $125 less profit, 1s less cars, #3 more trucks. 


Section 4.7 


1. (a) 3; (b) —1; (c) 1; (@ 1; (e) 7/2; (f) no solution; (g) 3; 
(h). — 1. 3. (a) 4, 2; (b) 3, 3.45, —1.45; (c) —1.18,, —.15, 1.33; 
(d) —.72, 1.22, —.25 + 1.03i (imaginary). 5. (a) 68; (b) 65. 

7. (a) 9.4; (b) (i) 8.4, (i) 8.1. 9, (a) (i) 4, (ii) F; «(b) 64 (exact); 


(c) 53.8; (d) 8.4. lil. y, = —.5(K — 100), 1 Sk = 99. 
Section 4.7 Appendix 
1. s(3.1) = —.98 versus true value — .96, 5(3.9) = —1.18 versus true value 


= 3.20. 3. (a) s(.1) = .308 versus true value of .309, s(.65) = .890 versus 
true value of .891; (b) integral = .636. 5. (b) Spline approximation 
— .047 versus true integral —0.45. 


Chapter 5 


Section 5.1 


1. (a) Line 2x + 3y = 10; (b) line x — 2y = —4. 3. (a) No solution; 
(b) one solution; (c) infinite solutions. 5. (a) (2, 1]; (b) [1, 0, —2); 
(c) [5,4, —7]; (d [1], —7, 5]; (e) [3, 1, —1); (f [0, 0, 0} (invertible 
matrix); (g) [0, O]. 7. (a) [0,.0, 10] + rfl, 1, —1} (b) GB, 3, 71. 

9: (a) [5,.0, 5,.0, 5] + 7f1,.—1, —1, 1, 0} + sft, 0, 0,.0,-—1): 

(b) [10, —16, —5, 10, 10}; (c) [20,5, 10, —35, —5}. 

11. (a) r{2, 11, —15); (b) (8, F, 10); (©) (15, —48, FI. 

13. 380, + 2NO, + 2H,O — 4H + 3SO, + 2NO. 15. (a) r{l, —2, 1]; 
(b) r{i, 1, —2]; (© 7{1, 0, —1]; @ “AI, -—2, 2, —2, 1}; 

fe) rit, —2, 1, 1...—-2,.1: © 0:0, 070.0, 0) 

ri, ht, & =), =—1, 1h 19. (a) [5, —10]; (b) [5, 0,] + r{2, 1). 
21. (a) [5, 2, 3]; (b) (§, —3, 0, 0) + r{2, 1, —1, 0) + s{—1, 2, 0, 1). 

23. For all probability vectors, p = O and p, + p,» +++: +p, = 1. 

(a) p» = 0; (b) ps = po, (eC) p3 = Pi; (A) py + Ps + Ps = Pr + Pa’ 

(e) 2p. + 2ps = py + Ps + Px + Po; (£) mone; (g) py + py + ps = pr + 
Ps + De; 25. For all (finite) powers, p, + p3 + ps = po + ps + De: 

‘27. Ax’ = A(cx, + dx,) = cAx, + dAx, = cb + db = (c + d)b = b. 
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Section 5.2 


1. x = a X> “4 ar X3 = s. 3. (a) #2, 1} =~ 4(2, =i}; 
(b) 8(2, —3] + ¥[-3, 6]; (© §1, 3] — ¥[-2, 3); 
(d) [3, 3, 0] + r[3, I, =—2}. 


An te er = s 3]. =-1 — 8 -3 
5. (a) A -|! i} (b) A eis (c) A -|_3 | 


7. (a) Any column, rank 1; (b) first two columns, rank 2; (c) any two col- 
umns, rank 2; (d) first two columns, rank 2. 9 (a) Col: [—1, 2], 

Null: [5, 3], rank 1; (b) Col: (2, 1, 1], [1, 2, 1], Null: [—3, —1, 1], rank 2; 
(c) Col: first three cols., Null: [—1, 0, 0, 0, 1], [1, —1, —1, 1, 0], rank 3; 
(d) Col: first two cols., Null: [-—2, 1, 1, 0, 0], [1, —2, 0, 1, O], 

f[—1, —1, 0, 0, 1], rank 2; (e) Col: all but fourth column, 

Null: [1, 1, —1,. —1, 0, O]., rank 5. 


ll. N = he! where I is (n — r)-by-(n — r). 


13. (a) af = a® + a®, af = —fal + 2aS; (b) af = af + 208, 

aS = 2aC — af; (c) af = af — af, af = —af + 2af. 

15. (a) Obtain A from A* by reversing steps in elimination by pivot. 

17. (a) Rows in final upper triangular matrix U are linearly independent unless 
some row is all 0’s (by Exercise 16(b)). Since U is derived from A, if U’s rows 
are linearly independent, A’s rows are linearly independent; (b) columns line- 
arly dependent < rows linearly dependent, now use part (a). 19. (b) The two 
rows of A; (c) first four rows of A. 21. (a) Rank({[A, b]) = rank(A) means 
b is linearly dependent on columns of A, that is, b is in Range(A); (b) if b not 
in Range(A), then [A, b] has one more linearly independent vector (namely; b) 
than the columns of A. 


‘ 


23. [x’ x"]in Null({A -—B])@[A 81) *,| = 0< Ax’ = Bx" (= d). 


* 2 | 0 
29. (a) E |= |i fee a+ [Of et 1]; 


a l 0 
Gis 4°31) =P pet 2.8 tele he i i. 
ie <9 l 6 


31. Col (a * b) = all multiples of a, Row (a * b) = all multiples of b. 


Section 5.3 


1. c(A) = 9.2 (sum norm). 3. (a) y = —.46x + 9.53; (b) 392; 
(c) ¥ = —.46x + 63. 
F fhe 0826 —.0367 


0162 —.038] pach e Were pees 


7. (a) (8.8; (b) Al, 2,3; © oF t.. ab 


2 l 0 =] 081 .098 404 —.124 
(d) &| 0 Oe 1]; (e) 098 —.024 098 71}. 
l 0 l 2 — .097 2055 = 065 129 
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9. (a) w = [5.64, 13.88]; (b) .13 day less; (c) .17 day less. 


= 5D) ven 6 68 - =. 11 
M1. (a) | —.99 46 2 . =24 2.03 }, c(X’ X) 


3390; 
2.53 2.34 -—2.83 15 —-1.8 


(b) § = .47GPA.,, + 1.88GPA,; — .49; (c) e = [—.1, 0, —.15, .11, .14]. 
13. § = 3.92x3 — 9.86x? — 1.13x + 8.5, c(X7X) = 6850. 

15. X* = {1/(x- x)}x and g = X*y = [{1/(x- x)}x] -y = x° y/x x. 

17. (a) 1/V2, 45°; (b) .28, 74°; (c) 1/V2, 45°; (d) 0, 90°; (e) .47, 62°; 
(f) .62, 52°. 19. (a) —.265; (b) .080; (c) —.765. 

21. (a) [1, —2, 0,2, —1]); (b) [-1, 3, -—1, -—2, 1); 

(c) {1, —1, 1, -—1, 1, —1). 

23. (a) ja — b)? = la? + |b|? — 2[al |b] cos @ < |al? + |bl? + 2lal |b] = 

(\a| + |b})?. 

25. If v* = 2 rv, w* = Zqyw,, then v* - w* = 2; 2, 7:9;¥;" Ww; = 2, 2,0 = 0. 
27. (a) v, = [1, 2], w, = [-2, 1); @) v, = U1, 2, 3], w, = (3,9, —1), 
w, = (2, —1, 0); (c) v, = [1, 2, 1], v, = [0, —1, 1], w, = [3, -—1, —1); 
(d) v, = [2, 1,0, —1], v> = [0, 1, —2, 1), v3 = [1, 0, 1, 2], 

w, =[—1, 2; 1.405 ‘© &= 1,5,2,.-8..s = 2.1, t+ 

v, = [-1, 3, 0, 1], w, = [.54, .24, —.2, —.16]. 29. x = ff, #, 41. 


Section 5.4 


ao, 2-2 4 
1. (a) j | (b) 4] 2 fr —21x = 1; —2, 11). 
8 6 ' 5 


2 


3. First column, a,, must be e, (is zero below main diagonal); a, is nonzero is 
first two positions and a,-e, = O-—>a,=e,;etc. 5. (a) Weights are 3, #, 4; 
(b) weights are 4, $, 4; (c) weights are .7755, .1633, 3265. 
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11. (a) 93° (close to orthogonal); (b) 37°; (e) 173°. 
13. (a) 1/V2[1, 1], 1/V 201, —1]; (b) 412, 1, 2], 1/V369[14, —2, —13]; 
(c) 1/V11[3, 1, 1), 1/V330[—7, 16, 5], 1/V177870[ —77, — 154, 385]. 


ee et, | eee z 4 ie 
an ek ! all civ 0 Al > 

13 2 -§5 
Dae | 

—3 0 3 -3 -]1 l 3 

now ' hod? | 
(c) oo : if 
19. At = (ATA)~'A™ = [(QR)’QR] ‘(QR)’ = [R7Q’QR]'R’Q? = 
[R’R]~'R'Q’ = R~-'R?- iR7Q° = R- 'Q’. 
25. (a) f(x) >= — phe + .856x?; (b) f(x) = #% + 43x’: 


(c) f(x) = 2.80x — 2.17x; 27, (a) L(x) = x* — 6x7/7 + &:; 
(b) L(x) = © — 10x°/9 — 5x/21; 
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Solutions to Odd-Numbered Exercises 


29. (a) K,(x) = © — 3x2/2 + 3x/5 — 


(b) 2x? — 1.286x7 + .296x — .014 


900 —2520 1680 
| 361; (b) | —2520 7350 —35040 |, 9195; 
1680 —5040 3528 


648 —720 


sadears Be $10 


16,201 —39,603 23,162 
(c) | —39,603 98,018 —59,405 |, 66,222. 
23,762 —59,405 36,303 
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1 olf4 ojf 1 0 
ROS sie all yak 
46 1][ 5.4 O|{.52 76 
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| 1}/}4 O]4 43 ! 
(c) oe a 0 4 ; - if (d) defective matrix, only one eigenvector. 
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9. (a) A, = 4.67 u, = [.82, 1, .21], A, = —1.79, u, = [1], —.93, .52], 
A; >= .12 u, = [—.41, .12, 1], 


1.83 2:23: 47 — .84 18 —.44 Ol =—O1 —.03 
kite Rib oop? so" = o94 41} + | —.0] .O2 .O1 
4S 8638 (AZ — .44 Al = 222 = 5 Ol 10 


(b) A, = 2.73 u, = [1, 1, .73], Ap = —.73 uw, = [—.37, —.37, 1], A = 0, 
u, = [—1, 1,0), 


1.08 1.08 .79 — US =.68 21 
1.08 1.08 .79} +] -—.08 —.08 7) 
Mee» re Y 21 eu = 337 


(c) 4; = 4.71 u, = [—.56, 1, .1, —.52], Ap = —1.97 
u, = [—.12, .54, —.78, 1], 
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93 -1.65 —.17 86 —~.01 .07 —.10 12 
-1.65 2.95 .29 —1.54) | 07 -.30 .43 —-.56 
—.17 Sey i = 5 —-10 43 -.63 80 
86 —.154 —.15 80 12 -—.56 .80 —1.03 
19558 By 7 13 01 —.21 —.16 
, [58 35.29 10] | O01 .00 —.02 —.OI 
48 .29 .24 .08 —.21 -.02 .36 } .26 
TF 10° 08: -.16 -.01 .26  .20 
11. A ~ 1.387 u = [.11, .267, .577, .764], first half of u, = V2u, Au * wis 
upper right 4-by-4 submatrix of (28). 
9 10 17 pen k: 
Siwy 72 S61. tok tt A ee fair fit. 
ee 2.5 16 40 3.9 
ft oS 24 VS 39 38 
2/V5 2 
17. (a) | 1/V5 | (V5]U1]. | 1 |: 
0 0 
OF & 253 ae A 7 -A 
74. 0 1) .S2.° 6 
(b) | .61 .64 ig call ps a) 21 3.51/+!1 9 —5k 
19 —.55 2.7 4.6 — 
29 <§2 HG me fe 1.7 —-1.5 
fc) 16 —.401|6.72 0O 11.65 .76] 13.3 3.9 =13, “Lay 
44 -—.28]| 0 4.34]1.76 —.65]’11.9 2.2 —.9 8] 
39 ~=—.70 1:7 29 2.3 -2.0 
34 -.85 3 AS 5: Sa 
635 G. 11.37 82 
(d) | .55 —.17 - || ee x nl og) Maa ae WSS AS a 
J6é+ 30 2.8 4.1 y= 
30 86.39 —.23 
19. (a) (3, 5, O}; (b) ke 13 sf 
12  .003 —.006 16 = eh 
‘c) i 15.09 ist “ Te i! 
Section 5.5 Appendix 
1. f) k, = 7,8, = 11,1 %.= —5,8 = 1 —1tk @® 2X, = 5, 
ui, = 1, —}i, A> = —J, u, — 1, 1}; (c) A, = Be u; = [l, ll, A> — —]. 
u, = [1, —1]; (@) A, = 5.06, uw, = [.62, 1, .17], \. = 3.14, 
u, = [—.92, .4, 1], A; = 3.08, u, = [.81, —.66, 1]; (©) A, = 2.41, 
a, = (71,-71, 1, = 1. = T= 1.1.0), by = = 41, | 
u, = [-.71, —.71, 1]; (® A, ~ 4.71, u, = [—.56, 1, .1, —.52], 
Az = —1.97, wy = [—.12, .54, —.78, 1), Ay = 1.58, us = [1, .61, .50, .18}, 


\, = 0.68, u, = [—.60, —.05, 1, .74]. 
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Affine linear transformation, 256 

Alphabetic code; see Coding models 
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Anton, H., 511 
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Associative rules in matrix algebra, 
120, 121 
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Braun, M., 512 
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460 

Cayley, Arthur, 508 

Cayley—Hamilton theorem, 174 
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Change of basis, 457 
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Characteristic equation of a differential 
equation, 295 

Characteristic equation and polynomial 
of recurrence relation, 332 

Characteristic polynomial of a matrix, 
165 

Chemical equation balancing, 288, 398 

Cheney, W., 512 

Chicken farm optimization model, 341 

Clark V., 512 

Coding models 

alphabetic, 48, 66, 200 
binary, 103, 104 

Column space of a matrix, 414, 420, 
427 

Column vector, 62 

Commutative rules in matrix algebra, 
83, 119, 121 

Complete pivoting, 245 

Computer/dog growth model, 131, 
133, 134, 204, 215 

Computer graphics, 255 

Computer job processing problem, 72, 
75, 80 
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441, 460, 466, 472, 509 
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Conte S., 512 
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of a 2-by-2 matrix, 159 
of a 3-by-3 matrix, 163 
product rule, 165 
Diagonal matrix, 119, 205, 480, 492 
Diagonalization of a matrix, 205, 480, 
492 
Dietician’s problem (a linear program), 
339 
Differential equations, 293, 391 
discrete approximation, 374 
system of differential equations, 296 
Differentiation transformation, 390, 
403 
Digital image, 55, 87, 486, 494 
Dimension of a vector space, 420, 423, 
448 
Discriminant, 158 
Distributive rules in matrix algebra, 
120, 121 
Dominant eigenvalue and eigenvector, 
217 
Dot product, 72 
Dunn, O., 512 
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Leontief economic model 
Edge in a graph, 98 
Eigenfunction, 391 
Eigenvalue, 133, 205, 296, 323, 388, 
391, 403, 499 
complex, 325, 492 
determining eigenvalues, 165, 216, 
484, 501, 502 
Eigenvalue decomposition, 206, 482 
Eigenvector, 133, 166, 388, 389 
coordinates to represent other vec- 
tors, 134, 168, 265, 298, 388, 
479 
determining eigenvectors, 166, 216, 
484, 501, 502 
Electrical network, 289 
Elimination; see Gaussian elimination 
and Elimination by pivoting 
Elimination by pivoting, 185, 197 
Equilibrium, in economic model, 15 
Error bounds, 131, 246 
Error-correcting code, 104 
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Error-detecting code, 103 

Error space, 446 

Errors in elimination computations, 
242, 246 

Euclidean norm for matrix, 128, 174, 
175 

Euclidean norm of a function, 467 

Euclidean norm for vector, 127, 139 
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Falling object, equation of, 2 


Feasible region and feasible points in a 
linear program, 42, 338, 362, 363 


Fibonacci relation and Fibonacci num- 
bers, 330 

Fill-in during elimination, 241 

Filtering a digital image, 55, 87, 147 

Filtering a time series, 51 

Finite difference approximation of dif- 
ferential equation, 374 

Fourier series, 469 

Frog Markov chain, see Markov chain 
for frog 

Function space, 389, 466 

Functional approximation, 467, 469 

Fundamental matrix of an absorbing 
Markov chain, 313 
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Gambling Markov chain, 309 

Gantmacher, F., 513 

Gass, S., 513 

Gauss, Karl Frederick, 273, 507 

Gaussian elimination, 176, 179, 273 
computational complexity, 236 
history, 507 
for tridiagonal matrix, 237 

Gauss—Jordan elimination, 185 
history, 507 

Gauss—Seidel iteration, 235 
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Gram-—Schmidt orthogonalization, 46] 
Graph, 98 

Graphic transformations, 255 
Grassmann, Hermann, 509 

Graybill, F., 512 

Grossman, S., 511 
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Hamming code, 104 

Harvesting model, 326 

Hax, A., 513 

Heat equation, 375 

Helzer, G., 512 

Herman, E., 514 

Hidden surfaces, in computer graphics, 
265 

Hilbert matrix, 473 

Hildebrand, F., 512 

Hillier, F., 513 

Hoel, P., 513 

Homogeneous system of equations, 397 

Householder transformation, 463 


I 


Identity matrix, 113 

[ll-conditioned system of equations, 8, 
461 

Incidence matrix of a graph, 107, 125 

Inconsistent system of equations, 404 

Independent set in a graph, 107 

Independent set of vectors, 416 

Independent variables in linear pro- 
gram, 344 

Inner product, 72 

for functions, 466 

Input constraint in Leontief economic 
model, 17, L115, 220 

Input values, 2 

Integer program, 108 

Integral, approximations of, 372 


Geodesy and geodetic survey, 273, 507 _Integral transformation, 390 
Geometric series for matrices, 221, 312 Inverse of a matrix, 193, 203, 329 
Geometry of system of linear equa- computing an inverse, 197 

tions, 394 formula for inverse of 2-by-2 matrix, 
Gewirtz, A., 511 . 197 
Goldberg, S., 513 with orthogonal columns, 456 
Goldstein, Herman, 509 Inverse iterative method for eigenval- 
Grades, regression model, 38 ues/eigenvectors, 501 


538 


Inverse of a linear transformation, 390 

Invertible matrix, 194 

Iterative solution for eigenvalue/eigen- 
vector, 25, 217, 501 

Iterative solution of Markov chain, 25, 
310 

Iterative solution of matrix equations, 
16, 226, 230 
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Jacobi iteration, 229 
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Karmarkar’s linear programming algo- 
rithm, 350 

Kemeny, J., 513 

Kernel of a matrix, 393 

Khachian’s linear programming algo- 
rithm, 350 

Kincaid, J., 512 

Kirchhoff’s current and voltage laws, 
290, 292 

Kolman, B., 511 

Kumpel, P., 511 


L 


Least squares approximation, 274, 434, 
438, 468 

Legendre polynomials, 467 

Leibnitz, G. W., 507 

Length of a path, 99 

Leontief economic supply-demand 
model, 14, 64, 76, 114, 124, 131, 
180, 220, 224 

Leslie population model, 322, 333, 502 

Lieberman, G., 513 

Lincoln, Abe, bust of, 494 

Linear combination, 6 

Linear dependence of vectors, 415 

Linear independence of vectors, 416 

Linear model, 6 

Linear program, 41, 77, 337, 428 

Linear program for planting crops, 41, 


Linear regression; see Regression 
Linear transformation, 256, 387 
Long, Cliff, 494 
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Lower triangular matrix, 164, 184 
LU decomposition, 184, 188, 207, 
237, 427, 504, 509 
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Magid, A., 512 
Magnanti, T. ae 
Magnenat- -Thalman, N., 512 
Markov chain, 21, 78, 85, 116, 130, 
303 
Markov chain for frog, 23, 64, 78, 86, 
116, 134, 201, 238, 304, 400, 
407 
absorbing Markov chain, 309 
regular Markov chain, 307 
Markov chain for weather, 22, 84, 133 
Mathematical model, 2 
Matrix, 61 
Matrix addition, 67 
Matrix algebra, rules of, 119 
Matrix exponential, 296 
Matrix multiplication, 81, 425 
in adjacency matrices, 100 
computational complexity of matrix 
multiplication, 142 
noncommutativity of matrix multipli- 
cation, 83 
in partitioned matrices, 145 
in terms of simple matrices, 425 
Matrix norms, 128 
Matrix notation, 61, 508 
Matrix product of vectors, 424 
Matrix-vector product, 74 
Max norm of matrix, 128 
Max norm of vector, 127, 139 
Mean of a data set, 50 
Membership vector, 107 
Mendelhall, W., 512 


-Mesh, mesh points, 372, 376 


Model, dynamic versus static, 21 
Model, linear, 6 
Model, mathematical, 2 
Multiple solutions, 394, 449 
Multiplication 

matrix, 81 

matrix-vector, 74 

vector, 72 
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Newton's method for finding zeros of a 
function, 367 
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Nicholson, W. K., 511] 

Noble, B., 512 

Node in a graph, 98 

Nonlinear regression, 281 

Nonsingular matrix, 194 

Norms of a matrix; also see Euclidean 
norm, Sum norm, Max norm, 128 

Norms of a vector; also see Euclidean 
norm, Sum norm, Max norm, 127 

Null space of a matrix, 393, 396, 423, 
449 
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Oil refinery problems; see Refinery 
problems 

Ones vector, 112 

Optimal refinery production problem, 
4] 

Orthogonal columns, 442, 455 

Orthogonal polynomials, 467, 469 

Orthogonal vectors, 437, 445, 455 

Orthonormal basis, 457, 461, 467 

Orthonormal vectors, 457, 46] 

Outlier, 279, 447 
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Parity-bit code, 103 

Partitioning a matrix, 143, 312 

Path in a graph, 99 

Pattern recognition, 54 

Piecewise approximation, 372, 380 

Pivot on entry, 186 

Pivot exchange in linear program, 346 

Pivot matrix, 364 

Pivoting, 186, 245, 346, 361 

Plemmons, R., 513 

Polynomials, approximating, 440, 467, 
469 

Population growth models, 321; also 
see Rabbit/fox growth models 

Port, S., 513 

Predicting grades, regression model, 38 

Preparata, F., 512 

Principal component analysis, 489 

Principle for multivariable problems, 
12 

Probability distribution, in Markov 
chain, 22 

Projection, 261, 276, 435, 437, 457, 
461 
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Pseudoinverse of a matrix, 439, 441, 
458, 466, 471, 493 
with orthogonal columns, 443 
with orthonormal columns, 458 


Q 


QR decomposition, 464, 502, 509 


R 


Rabbit/fox growth model, 25, 90, 123, 
130, 162, 166, 207, 388 

Rabbit/fox nonlinear growth model, 30 

Raleigh quotient, 219, 501, 503 

Range of a matrix, 393, 405, 414, 
420, 422, 448 

Rank of a matrix, 420, 423, 427, 448 

Ranking teams, 101 

Recurrence relation, 329 

Refinery production, basic problem, 
12, 63, 73, 163, 177, 182, 199, 
227, 249, 389, 414, 426 

Refinery production, with two prod- 
ucts, 40, 397 

Refinery production, with two refiner- 
ies, 37, 406, 434, 439, 493 

Reflection, 268 

Reflection transformation, 390 

Regression, 39, 273, 404, 435, 438, 
444 


Regular Markov chain, 306 
Revised simplex algorithm, 366 
Rogers, D., 512 

Rorres, C., 512 

Row space, 423 

Row vector, 62 
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Scalar, 67 

Scalar factoring in matrix algebra, 120 

Scalar multiplication, 67 

Scalar product, 72, 437, 444, 455, 466 

Scaling before pivoting, 244 

Sensitivity analysis in a linear program, 
352 

Shamos, M., 512 

Sherbert, D., 511 

Shift transformation, 390 
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Shifted inverse iterative method for 
eigenvalue/eigenvectors, 501 

Similar matrices, 497 

Similarity measure, 118 

Simple matrix, 424, 427, 482 

Simplex algorithm of linear program- 
ming, 346, 361 

Singular matrix, 194 

Singular value decomposition, 492 

Sitomer, H., 511 

Slack variable, 344 

Smith, G., 512 

Smoothing time series, 50 

Snell, L., 513 

Software for matrix computation, 513 

Sparse matrix, 142, 147 

Spiegel, M., 513 

Spline, 374, 380 

Stable distribution in a Markov chain, 
25, 134, 238, 304, 305 

Stable elimination, 245 

Stable pivoting; see Stable elimination 

Statistics; see Correlation coefficient, 
Principal component analysis, 
Regression 

Stewart, G. W., 503, 512 

Stone, C., 513 

Strang, G., villi, 511, 513 

Sum norm of matrix, 128 

Sum norm of vector, 127, 139 

Sylvester, J. J., 508 

Symmetric matrix, 99, 117, 136, 174, 
483, 501 
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Test scores, 67 

Thalman, D., 512 

Thorpe, J., 511 

Time series, 50 

Transition diagram, in Markov chain, 
22 
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Transition matrix of a Markov chain, 
22, 130 

Transition probabilities, in Markov 
chain, 22 

Transportation problem, 340, 351, 401, 
417 


_Transpose of a matrix, 117, 437, 448, 
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Trapezoidal rule, 372 
Tridiagonal matrix, 147, 238, 376, 383 
Tucker, A. W., 511 
Turing, Alan, 509 
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Umbrae, 508 | 

Undetermined system of equations, 40 

Unique solutions, 204, 394, 396, 408, 
423 

Unit vector, 121 

Upper triangular matrix, 164, 184 
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Van Loan, C., 512 
Variance of data, 489 
Vector, 62 
as a point in space, 65 
Vector calculus, 277, 283 
Vector multiplication, 72 
Vector norms, 27 
Vector space, 389, 393, 414, 466, 509 
Von Neumann, John, 509 
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Weather Markov chain; see Markov 
chain for weather 
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